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REGULATORS OF G-PROTEIN SIGNALLING 
Background of "the Invention 
The invention relates to regulators of 
s heterotrimeric G-protein mediated events and uses thereof 
to mediate cell signalling and membrane trafficking. 

The heterotrimeric guanine nucleotide binding 
proteins (G proteins) are intracellular proteins best 
known for their role as transducers of binding by ^ 
10 extracellular ligands to seven transmembrane receptors - 
(7-TMRs) located on the cell surface. Individual 7-TMRs 
have been identified for many small neurotransmitters 
(e.g. adrenaline, noradrenaline, dopamine, serotonin, 
histamine, acetylcholine, GAB A, glutamate, and 
is adenosine) , for a variety of neuropeptides and hormones 
(e.g. opioids, tachykinins, bradykinins, releasing 
hormones, vasoactive intestinal peptide, neuropeptide Y , 
thyrotrophic hormone, leutenizing hormone, follicle- 
stimulating hormone, adrenocorticotropic hormone, 
20 cholecystokinin, gastrin, glucagon, somatostatin, 

endothelin, vasopressin and oxytocin) as well as for 
chemoattractant chemokines (C5a, interleukin-8, platelet- 
activating factor and the N-formyl peptides) that are 
involved in immune function. In addition, the odorant 
25 receptors present on vertebrate olfactory cells are 7- 
TKRs, as are rhodopsins, the proteins that transduce 
visual signals. 

Ligand binding to 7-TMRs produces activation of 
one or more heterotrimeric G-proteins. A few proteins 
3o with structures that are dissimilar to the 7-TMRs have 
also been shown to activate heterotrimeric G-proteins. 
These include the amyloid precursor protein, the terminal 
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complement complex, the insulin-like growth 
factor/mannose 6 -phosphate receptor and the ubiquitous 
brain protein GAP-43. Dysregulation of G-protein coupled 
pathways is associated with a wide variety of diseases, 
5 including diabetes, hyperplasia, psychiatric disorders, 
cardiovascular disease, and possibly Alzheimer's disease. 
Accordingly, the 7-TMRs are targets for a large number of 
therapeutic drugs: for example, the ^-adrenergic 
blockers used to treat hypertension target 7-TMRS. 

io . Unactivated heterotrimeric G-proteins are 

complexes comprised of three subunits, Ga, G0 and Gy. 
The subunits are encoded by three families .jof genes; in 
mammals there are at least 15 Ga, 5 G£ and 7 Gy genes. 
Additional diversity is generated by alternate splicing. 

15 Where it has been studied, a similar multiplicity of G- 
proteins has been found in invertebrate animals. 
Mutations within Ga subunit genes is involved in the 
pathophysiology of several human diseases: mutations of 
Ga that activate Gs or Gi2 are observed in some endocrin 

20 tumors and are responsible for McCune-Albright syndrome, 
whereas loss-of-functipn mutations of Gas are found in 
Albright hereditary osteodystrophy. 

The Ga subunits have binding sites for a guanine 
nucleotide and intrinsic GTPase activity. This structure 

25 and associated mechanism are shared with the monomeric 
GTP-binding proteins of the ras superf amily . Prior to 
activation the complex contains bound GDP: GaGDP/?y . 
Activation involves the catalyzed release of GDP followed 
by binding of GTP and concurrent dissociation of the 

30 complex into two signalling complexes: GaGTP and j3y. 
Signalling through GaGTP, the more thoroughly 
characterized pathway, is terminated by GTP hydrolysis to 
GDP. GaGDP then reassociates with 0y to reform the 
inactive, heterotrimeric complex. 
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The mammalian G-proteins are divided into four 
subtypes: Gs, Gi/Go, Gq and G12 . This typing is based 
on the effect of activated G-proteins on enzymes that 
generate second messengers and on their sensitivity to 
5 cholera and pertussis toxin. These divisions also appear 
to be evolutionarily ancient: there are comparable 
subtypes in invertebrate animals* Members of two 
subtypes of G-proteins control the activity of adenylyl 
cyclases (ACs) - Activated Gs proteins increase the 

10 activity of ACs whereas activated Gi proteins (but not 
Go) inhibit these enzymes. Gs proteins are also uniquely 
activated by cholera toxin. ACs are the enzymes . 
responsible for the synthesis of cyclic adenosine 
monophosphate (cAMP) . cAMP is a diffusible second 

is messenger that acts through cAMP -dependent protein 

kinases (PKAs) to phosphorylate a large number of target 
proteins. Members of two subtypes, all Gi/Go proteins and 
the Gq proteins, increase the activity of inositol 
phospholipid-specif ic phospholipases (IP-PLCs) . The 

20 activity of the subtypes are distinguishable: activation 
of Gi and Go are blocked by pertussis toxin whereas Gq is 
resistant to this compound. IP-PLCs release two 
diffusible second messengers, inositol triphosphate (IP 3 ) 
and diacylglycerol (DAG) . IP 3 modulates intracellular 

25 Ca 2+ concentration whereas DAG activates protein Kinase Cs 
(PKCs) to phosphorylate many target proteins. The second 
messenger cascades allow signals generated by G-protein 
activation to have global effects on cellular physiology. 

Activation of G proteins frequently modulate ion 
30 conductance through plasma membrane ion channels. 

Although in some cases these effects are indirect, as a 
result of changes in second messengers, G-proteins can 
also couple directly to ion channels. This phenomenon is 
known as membrane delimited modulation. The opening of 
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inwardly rectifying K channels by activated Gi/Go and of 
N and L type Ca channels by Gi/Go and Gq are commonly 
observed forms of membrane delimited modulation. 

Heterotrimeric G proteins appear to have other 
5 cellular roles, in addition to transducing the binding of 
extracellular ligands. Analysis of the intracellular 
localization of the various G-protein subunits combined 
with pharmacological studies suggest, for example, that G 
proteins are involved in intracellular membrane 
10 trafficking. Indeed, some workers hypothesize that G 

proteins evolved to control membrane trafficking and that 
their role in transducing extracellular signals evolved 
later- Studies implicate heterotrimeric G-proteins in 
the formation of vesicles from the trans-Golgi network, 
15 in transcytosis in polarized epithelial cells and in the 
control of secretion in many cells, including several 
model systems relevant to human disease: mast cells, 
chromaffin cells of the adrenal medulla and human airway 
epithelial cells. Nonetheless, the G-protein subunits 
20 involved in membrane trafficking and secretion have yet 
to be definitively established and the mechanisms by 
which they are activated and control membrane trafficking 
remains largely unknown. 

Caenorhabditis elegans (reviewed in Wood, et al. 
25 (1988) The Nematode Caenorhabdxtis elegans . Cold Spring 
Harbor Press, Cold Spring Harbor, NY) is a small free- 
living nematode which grows easily and reproduces rapidly 
in the laboratory. The adult C. elegans has about 1000 
somatic cells (depending on the sex). The anatomy of C. 
30 elegans is relatively simple and extremely well-known, 

and its developmental cell lineage is highly reproducible 
and completely determined. There are two sexes: 
hermaphrodites that produce both eggs and sperm and are 
capable of self fertilization and males that produce 
35 sperm and can productively mate with the hermaphrodites. 
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The self fertilizing mode of reproduction greatly 
facilitates the isolation and analysis of genetic 
mutations and C. elegans has developed into a most 
powerful animal model system. In addition, C. elegajis has 
5 a small genome (-10 8 base pairs) whose sequencing is 
more advanced than that of any other animal. 

Genes that encode G-protein subunits in C. elegans 
were identified using probes to sequences conserved in 
corresponding mammalian genes. So far six Ga genes have 

10 been identified including the nematode homologs of 

mammalian Got S , Gcro and Gaq/ll as well as three putative 
.Ga proteins that have not yet been assigned to a 
mammalian subtype class. Goto, is encoded by the gene goa- 
2. The Gao protein from C. elegans is 80-87% identical 

15 to homologous proteins from other species. Mutations 
that reduce the function of goa-2 cause behavioral 
defects in C. elegans including hyperactive locomotion, 
premature egg-laying, inhibition of pharyngeal pumping, 
male impotence, a reduction in serotonin- induced 

20 inhibition of defecation and reduced fertility. 

Mutations of goa-2 homologous to the known activating 
mutations of mammalian Gets and Gai2 or over express ion of 
wild type goa-2 caused behavioral defects which appear to 
be opposite to those conferred by reducing goa-2 

2 5 function: sluggish locomotion, delayed egg-laying and 
hyperactive pharyngeal pumping. 

eg2-20 is a gene from C. elegans, originally 
identified by mutations that cause defects in egg-laying 
behavior (C. Trent, N. Tsung and H.R. Horvitz (1983) 

30 Genetics 104 :619-647) . The egg-laying defect appears to 
involve a pair of serotonergic motor neurons (the HSN 
cells) which innervate vulva muscles in C. elegans 
hermaphrodites (C. Desai, G. Garriga, S.L. Mclntire and 
H.R. Horvitz (1988) Nature 336 :638-646; c. Desai and H.R. 

35 Horvitz (1989) Genetics 121 : 703-72 12 ) . 
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Summary of -the Invention 
We have discovered a new family of proteins 
involved in the control of heterotrimeric G-protein 
mediated effects in both mammalian and non-mammalian 
5 cells. We disclose sequences which comprise the conserved 
domains of nine members of this family and methods for 
identifying additional members. We have named this 
family of proteins RGS proteins for Regulators of G- 
protein Signalling. 

io In general, the invention features substantially 

pure nucleic acid (for example, genomic DNA, cDNA, RNA or 
synthetic DNA ) r . encoding an RGS polypeptide as defined • • 
below. In related aspects, the invention also features a 
vector, a cell (e.g., a bacterial, yeast, nematode, or 

15 mammalian cell) , and a transgenic animal which includes 
such a substantially pure DNA encoding an RGS 
polypeptide. 

In preferred embodiments, an rgs gene is the e^I- 
10 gene of a nematode of the genus C. elegans or the 

20 human homolog, rgs7 . In another preferred embodiment, 
the RGS encoding nucleic acid cell is in a transformed 
animal cell. In related aspects, the invention features 
a transgenic animal containing a transgene which encodes 
an RGS polypeptide that is expressed in animal cells 

25 which undergo G-protein mediated events (for example, 
responses to neuropeptides, hormones, chemoattractant 
chemokines, and odor, and synthetic or naturally 
responses to opiates) . 

In a second aspect, the invention features a 

30 substantially pure DNA which includes a promoter capable 
of expressing the rgs gene in a cell. In preferred 
embodiments, the promoter is the promoter native to an 
rgs gene. Additionally, transcriptional and 
translational regulatory regions are preferably native to 

35 an rgs gene. 
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In another aspect, the invention features a method 
of detecting a rgs gene in a cell involving: (a) 
contacting the rgs gene or a portion thereof greater than 
9 nucleic acids, preferably greater than 18 nucleic acids 
.5 in length with a preparation of genomic DNA from the cell 
under hybridization conditions providing detection of DNA 
sequences having about 3 0% or greater sequence identity 
among the amino acid sequences encoded by the conserved 
DNA sequences of Fig. 3B or the sequences of sequence ID 

10 Nos. 2-5 and the nucleic acid of interacting. 

Preferably, the region of sequence identity used for 

■hybridization is the DNA sequence encoding *one of • the - 

sequences in the shaded region depicted in Fig. 3B (e.g., 
the DNA encoding amino acids 1-4 3 and 92-120 of the EGL- 

15 10 fragment shown in Figure 3B (SEQ ID NO: 1)). More 

preferably, the region of identity is to the DNA encoding 
the polypeptide sequence delineated by the solid black in 
Fig. 3B (e.g., amino acids 36-43 and 92-102 of the EGL-10 
sequence shown in Fig. 3B) . Even more preferably the 

20 sequence identity is to the sequences of ID Nos. 1-5. 
Most preferably, the sequence identity is to the 
sequences of SEQ ID NOS: 3 3 or 34. Most preferably, the 
sequence identity of the nucleic acid sequences being 
compaired is 50%. 

25 In another aspect, the invention features a method 

of producing an RGS polypeptide which involves: (a) 
providing a cell transformed with DNA encoding an RGS 
polypeptide positioned for expression in the cell (for 
example, present on a plasmid or inserted in the genome 

30 of the cell) ; (b) culturing the transformed cell under 
conditions for expressing the DNA; and (c) isolating the 
RGS polypeptide. 

In another aspect, the invention features 
substantially pure RGS polypeptide. Preferably, the 

35 polypeptide includes a greater than 50 amino acid 
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sequence substantially identical to a greater than 50 
amino acid sequence shown in the Fig. 2, open reading 
frame, more preferably the identity is to one of the 
conserved regions of homology shown in Fig. 3B (e.g. , the 
5 sequences 1-43 and 92-120) and, more preferably, 36-43 
and 92-102 of SEQ ID NO: 1 and most preferably, the 
identity is to one of the sequences shown in SEQ ID NOS: 
2-5. 

In another aspect, the invention features a method 

io of regulating G-protein mediated events wherein the 
method involves: (a) providing the rgs gene under the 
control of „a promoter .providing controllable expression 
of the rgs gene in a cell wherein the rgs gene is 
expressed in a construct capable of delivering an RGS 

is protein in an amount effective to alter said G-protein 
mediated events. The polypeptide may also be provided 
directly, for example, in cell culture and therapeutic 
uses. In preferred embodiments, the rgs gene is 
expressed using a tissue-specific or cell type-specific 

20 promoter, or by a promoter that is activated by the 

introduction of an external signal or agent, such as a 
chemical signal or agent. 

In other aspects, the invention features a 
substantially pure oligonucleotide including one or a 

25 combination of the sequences: 

5' GNIGANAARYTIGANTTRTGG 3', wherein N is G or A; 
R is T or C; and Y is A, T, or C (SEQ ID NO: 2); 

5' GN I G AN AAR YTI S G ITTRTGG 3', wherein N is G or A; 
R is T or C; Y is A, T, or C; and S is A or C (SEQ ID NO: 

30 3) ; 

5' GNTAIGANTRITTRTRCAT 3', wherein N is G or A; 
and R is T or C (SEQ ID NO: 4) ; 

5' GNTAN CTNTRITTRTRCAT 3', wherein N is G or A; 
and R is T or C (SEQ ID NO: 5) ; 
35 the egI-10 DNA shown in Fig. 2A (SEQ ID NO: 27); 
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ATCAGCTGTGAGGAGTACAAGAAAATCAAATCACCTTCTAAACTAAGTCCCAAGGC 
CAAGAAGATCTACAATGAGTTCATCTCTGTGCAGGCAACAAAAGAGGTGAACCTGG 
ATTCTTGCACCAGAGAGGAGACAAGCCGGAACATGTTAGAGCCCACGATAACCTGT 
TTTGATG AAGCCCGG AAGAAG ATTTT CAACCTG (SEQ ID NO: 15); 

5 CAGCTTGTAAATGTGCTCCTGAGCATCTTCGAATGTGTATCGTCCTGGTTCCTTCAC 
ATTCTGTGTGGTCTTGTCATAACTCTTCGAATCCAAGTTAATGGCACTGGGGGCCCC 
CGGAGCCAGAAATTCTTG CCATATTTCCTGTACTCGAGAGGGGACCTCTCGGATAG 
GCCTTTTCTTCAGGTCCTCCACTGCCAA (SEQ ID NO: 16) ; 

CTGGCCTGTGAGGAQTTCAAGAAGACCAGGTCGACTGCAAAGCTAGTCACCAAGG 
10 CCCACAGGATGTTTGAGGAGTT-TGTGGATGTGCAGGCTCCACGGGAGGTGAATATC 
GATTTCCAGACCCGAGAGGCCACGAGGAAGAACATGCAGGAGCCGTCCCTGACTT 
GTTTTG AT C AAG CC CAGGGAAAAGTCC AC AG CCT C (SEQ ID NO: 17); 

GAAGCCTGTGAGGATCTGAAGTATGGGGATCAGTCCAAGGTCAAGGAGAAGGCAG 
AGGAGATCTACAAGCTGTTCOTGGCACCGGGTGCAAGGCGATGGATCAACATAGAC 
15 GGCAAAACCATGGACATCACCGTGAAGGGGCTGAGACACCCCCACCGCTATGTGTT 
GGACGCGGCGCAGACCCACATTTACATGCTC (SEQ ID NO: 18) ; 

CTGGCTTGTGAGGATTTCAAGAAGGTCAAATCGCAGTCCAAGATGGCAGCCAAAGC 
CAAGAAGATCTTTGCTGAGTTCATCGCGATCCAGGCTTGCAAGGAGGTAAACCTGG 
ACTCGTACACACGAGAACACACTAAGGAGAACCTGCAGAGCATCACCCGAGGCTG 
20 CTTTGACCTGGCACAAAAACGTATCTTCGGGCTC (SEQ ID NO: 19); 

GTTG CCTGTG AGAATTACAAG AAG ATC AAG TCCCCCATC AAAATGG C AG AG AAGG C 
AAAGCAAAT CTATG AAG AATTCATCCAGAC AGAGG CCC CTAAAG AGGTGAAC ATT 
GACCACTTCACTAAAGACATCACCATGAAGAACCTGGTGGAACCTTCCCCTCACAG 
CTTTGACCTGGCCCAGAAAAGGATCTACGCCCTG (SEQ ID NO: 20); 

25 CTGGCCGTCCAAGATCTCAAGAAGCAACCTCTACAGGATGTGGCCAAGAGGGTGG 
AG G AAATCTGGCAAG AGTTC CT AG CT CC CGG AG CCC CAAG TG CAAT C AA CCTGG AT 
TCTCACAGCTATGAGATAACCAGTCAGAATGTCAAAGATGGAGGGAGATACACATT 
TGAAGATGCCCAGGAGCACATCTACAAGCTG (SEQ ID NO: 21) ; 
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CTAGCGTGTGAAGATTTCAAGAAAACGGAGGACAAGAAGCAGATGCAGGAAAAGG 
CCAAGAAGATCTACATGACCTTCCTGTCCAATAAGGCCTCTTCACAAGTCAATGTG 
GAGGGGCAGTCTCGGCTCACTGAAAAGATTCTGGAAGAACCACACCCTCTGATGTT 
CCAAAAGCTCCAGGACCAGATCTTCAATCTC (SEQ ID NO: 22); and 

5 GAGGCGTGTGAGGAGCTGCGCTTTGGCGGACAGGCCCAGGTCCCCACCCTGGTGGA 
CTCTGTTTACCAGCAGTTCCTGGCCCCTGGAGCTGCCCGCTGGATCAACATTGACA . 
GCAGAACAATGGAGTGGACCCTGGAGGGGCTGCGCCAGCCACACCGCTATGTCCT 
AG ATG CAG C AC AACTGC AC ATCTACATG CT C (SEQ ID NO: 23) . 

In another aspect, the invention features a 
10 substantially .pure, polypeptide, An a 
combination of the amino acid sequences: 

Xaa 1 Xaa 2 Xaa 3 Glu Xaa 4 Xaa 5 Xaa 6 Xaa 7 , wherein 
Xaa 2 is I, L, E, or V f preferably L; Xaa 2 is A, S, or E, 
preferably A; Xaa 3 is C or V, preferably C; Xaa 4 is D f E, 
15 N, or K, preferably D; Xaa 5 is L, Y, or F; Xaa 6 is K or R, 
preferably R; and Xaa 7 is K, R, Y, or F, preferably K 
(SEQ ID NO: 25) ; and 

Xaa x Xaa 2 Xaa 3 Xaa 4 Xaa 5 Xaa e Xaa 7 Xaa 8 Xaa 9 Xaa 10 
Lys, wherein Xaa x is F or L, preferably F; Xaa 2 is D, E, 
20 T, or Q, preferably D; Xaa 3 is E, D, T, Q, A, L, or K; 
Xaa 4 is A or L, preferably A; Xaa 5 is Q or A, preferably 
Q; Xaa 6 = L, D , E, K, T, G, or H; Xaa 7 is H, R, K, Q or D; 
Xaa 8 is I or V, preferably I; Xaa 9 = Q, T, S, N, K, M, G 
or A (SEQ ID NO: 2 6). More preferably, the sequences are 
25 LACEDXaaK, wherein Xaa is L, Y, or F and (SEQ ID NO: 3 3) 
FDXaa, AQXaa 2 Xaa 3 IXaa 4 , wherein Xaa f is E, D, T, Q, A, L, 
or K; Xaa 2 is L, D, E, K, T, G, or H; and Xaa 3 is H, R, K, 
Q, or D (SEQ ID NO: 34). 

In preferred embodiments the invention features 
30 polypeptides having the sequences substantially identical 
to the EGL-10 and the human RGS2 polypeptides shown in 
Fig. 3C. More preferably, the polypeptides are identical 
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to the sequences of EGL-10 and human RGS2 provided in 
Fig- 3C. 

In another aspect, the invention features a method 
of isolating a rgs gene or fragment thereof from a cell, 
5 involving: (a) providing a sample of cellular DNA; (b) 
providing a pair of oligonucleotides having sequence 
homology to a conserved region of an rgs gene (for 
example, the oligonucleotides of SEQ ID NOS: 2-5) ; (c) 
combining the pair of oligonucleotides with the cellular 

io DNA sample under conditions suitable for polymerase chain 
reaction-mediated DNA amplification; and (d) isolating 
the amplified -rgs- gene - or 'fragment thereof Where a J 
fragment is obtained by PCR standard library screening 
techniques may be used to obtain the complete coding 

15 sequence. In preferred embodiments, the amplification is 
carried out using a reverse-transcription polymerase 
chain reaction, for example, the RACE method. 

In another aspect, the invention features a method 
of identifying a rgs gene in a cell, involving: (a) 

20 providing a preparation of cellular DNA (for example, 
from the human genome) ; (b) providing a detectably- 
labelled DNA sequence (for example, prepared by the 
methods of the invention) having homology to a conserved 
region of an rgs gene; (c) contacting the preparation of 

25 cellular DNA with the detectably-labelled DNA sequence 
under hybridization conditions providing detection of 
genes having 50% or greater sequence identity; and (d) 
identifying an rgs gene by its association with the 
detectable label. 

30 In another aspect, the invention features a method 

of isolating an rgs gene from a recombinant DNA library, 
involving: (a) providing a recombinant DNA library; (b) 
contacting the recombinant DNA library with a detectably- 
labelled gene fragment produced according to the PCR 

35 method of the invention under hybridization conditions 
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providing detection of genes having 50% or greater 
sequence identity; and (c) isolating a member of an rgs 
gene by its association with the detectable label. 

In another aspect, the invention features a method 
5 of isolating an rgs gene from a recombinant DNA library, 
involving: (a) providing a recombinant DNA library; (b) 
contacting the recombinant DNA library with a detectably- 
labelled RGS oligonucleotide of the invention under 
hybridization conditions providing detection of genes 
10 having 50% or greater sequence identity; and (c) 
isolating an rgs gene by its association with the 

. detectable - label, - = • — — - 

In another aspect, the invention features a 
recombinant polypeptide capable of altering G-protein 
15 mediated events wherein the polypeptide includes a domain 
having a sequence which has at least 70% identity to at 
least one of the sequences of sequence ID Nos - 1 , 6-14 , 
25 or 26. More preferably, the region of identity is 80% 
or greater, most preferably the region of identity is 95% 
20 or greater. 

In another -aspect, the invention features an rgs 
gene isolated according to the method involving: (a) 
providing a sample of cellular DNA; (b) providing a pair 
of oligonucleotides having sequence homology to a 
25 conserved region of an rgs gene; (c) combining the pair 
of oligonucleotides with the cellular DNA sample under 
conditions suitable for polymerase chain reaction- 
mediated DNA amplification; and (d) isolating the 
amplified rgs gene or fragment thereof. 
30 In another aspect, the invention features an rgs 

gene isolated according to the method involving: (a) 
providing a preparation of cellular DNA; (b) providing a 
detectably- label led DNA sequence having homology to a 
conserved region of an rgs gene; (c) contacting the 
35 preparation of DNA with the detectably- label led DNA 
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sequence under hybridization conditions providing 
detection of genes having 50% or greater sequence 
identity; and (d) identifying an rgs gene by its 
association with the detectable label. 

5 In another aspect, the invention features an rgs 

gene isolated according to the method involving: (a) 
providing a recombinant DNA library; (b) contacting the 
recombinant DNA library with a detectably-labelled rgs 
gene fragment produced according to the method of the 

io invention under hybridization conditions providing 
detection of genes having 50% or greater sequence 

.-identity; and ; (c) isolating an rgs.- gene *by -its ~ - j 

association with the detectable label. 

In another aspect, the invention features a method 

15 of identifying an rgs gene involving: (a) providing a 
mammalian cell sample; (b) introducing by transformation 
(e.g. biolistic transformation) into the cell sample a 
candidate rgs gene; (c) expressing the candidate rgs gene 
within the cell sample; and (d) determining whether the 

20 cell sample exhibits an alteration in G-protein mediated 
response, whereby a response identifies an rgs gene. 

Preferably, the cell sample used herein is 
selected from cardiac myocytes or other smooth muscle 
cells, neutrophils, mast cells or other myeloid cells, 

25 insulin secreting 0-cells, COS-7 cells, or xenopus 

oocytes. In other preferred embodiments the candidate 
rgs gene is obtained from a cDNA expression library, and 
the RGS response is a membrane trafficking or secretion 
response or an alteration on [H 3 ] IP3 or cAMP Levels. 

30 in another aspect, the invention features an rgs 

gene isolated according to the method involving: (a) 
providing a cell sample; (b) introducing by 
transformation into the cell sample a candidate rgs gene; 
(c) expressing the candidate rgs gene within the tissue 

3S sample; and (d) determining whether the tissue sample 
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exhibits a G-protein mediated response or decrease 
thereof , whereby a response identifies an rgs gene. 

In another aspect, the invention features a 
purified antibody which binds specifically to an RGS 
5 family protein. Such an antibody may be used in any 

standard immunodetection method for the identification of 
an RGS polypeptide. 

In another aspect, the invention features a DNA 
sequence substantially identical to the DNA sequence 
10 shown in Figure 2A. In a related aspect, the invention 
features a DNA sequence substantially identical to the 

DNA n sequence _shown .in.Fig. ...7.. - 

In two additional aspects, the invention features 
a substantially pure polypeptides having sequences 
15 substantially identical to amino acid sequences shown in 
Figure 3C (SEQ ID NOS:27 and 40). 

In another aspect, the invention features a kit 
for detecting compounds which regulate G-protein 
signalling. The kit includes RGS encoding DNA positioned 
20 for expression in a cell capable of producing a 

detectable G-protein signalling response. Preferably, 
the cell is a cardiac myocyte, a mast cell, or a 
neutrophil . 

In a related aspect, the invention features a 
25 method for detecting a compound which regulates G-protein 
signalling. The method includes: 

i) providing a cell having RGS encoding DNA 
positioned for expression; ii) contacting the cell with 
the compound to be tested; iii) monitoring the cell for 
30 an alteration in G-protein signalling response. 

Preferably, the cell used in the method is a 
cardiac myocyte, a mast cell, or a neutrophil, and the 
responses assayed are an electrophysical response, a 
degranulation response, or IL-8 mediated response, 
3 s respectively . 
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For aforementioned methods involving the use of 
RGS proteins or rgs genes it is noted that the use IR- 
2 0/BL34 or gos-8 nucleic acids or proteins encoded there 
from are also included as methods of the invention. 
5 Preferably 1R2 0/BL34 and gos-8 nucleic and encoded 
proteins are used in methods for regulating G-proein 
signalling. 

By "rgs" is meant a gene encoding a polypeptide 
capable of altering a G-protein mediated response in a 
10 cell or a tissue and which has at least 50% or greater 

jl — — .»-wi icyxwua ucouixDSa xii r xy • jc 

The preferred regions of - identity- are- J as -described below 
under "conserved regions." An rgs gene is a gene 
including a DNA sequence having about 50% or greater 

is sequence identity to the RGS sequences which encode the 
conserved polypeptide regions shown in Fig, 3B and 
described below, and which encodes a polypeptide capable 
of altering a G-protein mediated response. EGL-10 and 
the human rgs2 are examples of rgs genes encoding the 

20 EGL-10 polypeptide from C.elegans and a human RGS 
polypeptide, respectively. 

By "polypeptide" is meant any chain of amino 
acids, regardless of length or post-translational 
modification (e.g., glycosylation or phosphorylation). 

25 By "substantially identical" is meant a 

polypeptide or nucleic acid exhibiting at least 50%, 
preferably 85%, more preferably 90%, and most preferably 
95% homology to a reference amino acid or nucleic acid 
sequence. For polypeptides, the length of comparison 

30 sequences will generally be at least 16 amino acids, 
preferably at least 20 amino acids, more preferably at 
least 25 amino acids, and most preferably 35 amino acids. 
For nucleic acids, the length of comparison sequences 
will generally be at least 50 nucleotides, preferably at 
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least 60 nucleotides, more preferably at least 75 
nucleotides, and most preferably 110 nucleotides. 

Sequence identity is typically measured using 
sequence analysis software (e.g., Sequence Analysis 
5 Software Package of the Genetics Computer Group, 
University of Wisconsin Biotechnology Center, 1710 
University Avenue, Madison, WI 53705). Such software 
matches similar sequences by assigning degrees of 
homology to various substitutions, deletions, 
id substitutions, and other modifications. Conservative 

substitutions typically include substitutions within the 
following groups : glycine, alanine; valine, isoleucine, 
leucine; aspartic acid, glutamic acid; asparagine, 
glutamine; serine, threonine; lysine, arginine; and 
15 phenylalanine and tyrosine. 

By a "substantially pure polypeptide™ is meant an 
RGS polypeptide which has been separated from components 
which naturally accompany it. Typically, the polypeptide 
is substantially pure when it is at least 60%, by weight, 
20 free from the proteins and naturally-occurring organic 
molecules with which it is naturally associated. 
Preferably, the preparation is at least 75%, more 
preferably at least 90%, and most preferably at least 
99%, by weight, RGS polypeptide, A substantially pure 
25 RGS polypeptide may be obtained, for example, by 

extraction from a natural source (e.g., a human or rat 
cell) ; by expression of a recombinant nucleic acid 
encoding an RGS polypeptide; or by chemically 
synthesizing the protein. Purity can be measured by any 
30 appropriate method, e.g., those described in column 

chromatography, polyacrylamide gel electrophoresis, or by 
HPLC analysis. 

A protein is substantially free of naturally 
associated components when it is separated from those 
35 contaminants which accompany it in its natural state. 
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Thus, a protein which is chemically synthesized or 
produced in a cellular system different from the cell 
from which it naturally originates will be substantially 
free from its naturally associated components. 
5 Accordingly, substantially pure polypeptides include 
those derived from eukaryotic organisms but synthesized 
in J?, coli or other prokaryotes. 

By "substantially pure DNA" is meant DNA that is 
free of the genes which, in the naturally-occurring 

10 genome of the organism from which the DNA of the 

invention is derived, flank the gene. The term therefor* 
includes, -for example-, - a. recombinant -DNA -which is * 
incorporated into a vector; into an autonomously 
replicating plasmid or virus; or into the genomic DNA of 

15 a prokaryote or eukaryote; or which exists as a separate 
molecule (e.g., a cDNA or a genomic or cDNA fragment 
produced by PCR or restriction endonuclease digestion) 
independent of other sequences. It also includes a 
recombinant DNA which is part of a hybrid gene encoding 

20 additional polypeptide sequence. 

By "transformed cell" is meant a cell into which 
(or into an ancestor of which) has been introduced, by 
means of recombinant DNA techniques, a DNA molecule 
encoding (as used herein) an RGS polypeptide. 

25 By "positioned for expression" is meant that the 

DNA molecule is positioned adjacent to a DNA sequence 
which directs transcription and translation of the 
sequence (i.e., facilitates the production of, e.g., an 
RGS polypeptide, a recombinant protein or a RNA 

30 molecule) . 

By "reporter gene" is meant a gene whose 
expression may be assayed; such genes include, without 
limitation, 0 -glucuronidase (GUS) , lucif erase, 
chloramphenicol transacetylase (CAT) , and 6- 

35 galactosidase. 
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By "promoter" is meant minimal sequence sufficient 
to direct transcription. Also included in the invention 
are those promoter elements which are sufficient to 
render promoter-dependent gene expression controllable 
5 for cell-type specific, tissue-specific or inducible by 
external signals or agents; such elements may be located 
in the 5 ' or 3 ' regions of the native gene . 

By "operably linked" is meant that a gene and a 
regulatory sequence (s) are connected in such a way as to 
10 permit gene expression when the appropriate molecules 
(e.g. , transcriptional activator proteins) are bound to 

the r egulatory.,. sequence ( s) .... . . , 

By "transgene" is meant any piece of DNA which is 
inserted by artifice into a cell, and becomes part of the 
15 genome of the organism which develops from that cell. 
Such a transgene may include a gene which is partly or 
entirely heterologous (i.e., foreign) to the transgenic 
organism, or may represent a gene homologous to an 
endogenous gene of the organism. 
20 By "transgenic" is meant any cell which includes a 

DNA sequence which is inserted by artifice into a cell 
and becomes part of the genome of the organism which 
develops from that cell. As used herein, the transgenic 
organisms are generally transgenic rodents and the DNA 
25 (transgene) is inserted by artifice into the genome. 

By an n rgs gene" is meant any member of the family 
of genes characterized by their ability to regulate a G- 
protein mediated response and having at least 20%, 
preferably 30%, and most preferably 50% amino acid 
30 sequence identity to one of the conserved regions of one 
of the RGS members described herein (i.e., either the 
egl-J.0 gene or the rgs 1-9 gene sequences described 
herein) . rgs gene family does not include the FlbA, the 
Sst-2, C05B5.7, GOS-8 , BL34 (also referred as 1R20) gene 
35 sequences. 
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By "conserved region" is meant any stretch of six 
or more contiguous amino acids exhibiting at least 30%, 
preferably 50%, and most preferably 70% amino acid 
sequence identity between two or more of the RGS family 
5 members. Examples of preferred conserved regions are 
shown (as overlapping or designated sequences) in Figs. 
3A and 3B and include the sequences provided by seq ID 
Nos. 2-5 f 25 and 26. Preferably, the conserved region is 
a region shown by shading blocks in Fig. 3B (e.g., amino 
10 acids 1-4 3 and 92-120 of the EGL-10 sequence shown in 
Fig- 3B (SEQ ID NO: 1) . More preferably, the conserved 
region is the. region delineated lay . a,. sp.lid block in Fig- 
3B (e.g., amino acids 36-43 and 92-102 of the EGL-10 
sequence of Fig. 3B) . Even more preferably, the 
15 conserved region is defined by the sequences of SEQ ID 
NOS: 1-5. Most preferably, the sequences are defined by 
the sequences of SEQ ID NOS: 33 and 34. 

By "detectably-labelled" is meant any means for 
marking and identifying the presence of a molecule, e.g., 
20 an oligonucleotide probe or primer, a gene or fragment 
thereof, or a cDNA molecule. Methods for detectably- 
labelling a molecule are well known in the art and 
include, without limitation, radioactive labelling (e.g., 
with an isotope such as 32 P or 35 S) and nonradioactive 
25 labelling (e.g., chemi luminescent labelling, e.g., 
fluorescein labelling) . 

By "transformation" is meant any delivery of DNA 
into a cell. Methods for delivery of DNA into a cell are 
well known in the art and include, without limitation, 
30 viral transfer, electroportion, lipid mediated transfer 
and biolistic transfer. 

By "biolistic transformation" is meant any method 
for introducing foreign molecules into a cell using 
velocity driven microprojectiles such as tungsten or gold 
35 particles. Such velocity- driven methods originate from 
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pressure bursts which include, but are not limited to, 
helium-driven, air-driven, and gunpowder-driven 
techniques. Biolistic transformation may be applied to 
the transformation or transfection of a wide variety of 
5 cell types and intact tissues including, without 

limitation, intracellular organelles, bacteria, yeast, 
fungi, algae, pollen, animal tissue, plant tissue and 
cultured cells. 

By "purified antibody" is meant antibody which is 
10 at least 60%, by weight, free from proteins and 

naturally-occurring organic molecules with which it is 
naturally associated. Preferably, the preparation is .at 
least 75%, more preferably 90%, and most preferably at 
least 99%, by weight, antibody, e.g., an EGL-10 specific 
15 antibody. A purified RGS antibody may be obtained, for 
example, by affinity chromatography using recombinantly- 
produced RGS protein or conserved motif peptides and 
standard techniques. 

By "specifically binds" is meant an antibody which 
20 recognizes and binds an RGS protein but which does not 
substantially recognize and bind other molecules in a 
sample, e.g., a biological sample, which naturally 
includes RGS protein. 

By "regulating" is meant conferring a change 
25 (increase or decrease) in the level of a G-protein 
mediated response relative to that observed in the 
absence of the RGS polypeptide, DNA encoding the RGS 
polypeptide, or test compound. Preferably, the change in 
response is at least 5%, more preferably, the change in 
30 response is greater than 20%, and most preferably, the 
change in response level is a change of more than 50% 
relative to the levels observed in the absence of the RGS 
compound or test compound. 

By "G-protein signalling response" is meant a 
35 response mediated by heterotrimeric guanine nucleotide 
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binding proteins. It will be appreciated that these 
responses and assays for detecting these responses are 
well-known in the art- For example, many such responses 
are described in the references provided in the detailed 
5 description, below. 

By an "effective amount" is meant an amount 
sufficient to regulate a G-protein mediated response. It 
will be appreciated that there are many ways known in the 
art to determine the effective amount for a given 
io application. For example, the pharmacological methods 
for dosage determination may be used in the therapeutic 

. . context. .. .. ^ .... ... _ .. - . . 

Other features and advantages of the invention 
will be apparent from the following description of the 
15 preferred embodiments thereof, and from the claims. 

Detailed Description 
The drawings will first be described. 
Drawings 

Fig. 1A is the genetic map of region of C. elegans 
20 chromosome V that contains the gene egl-10* 

Fig. IB is a physical map of the egl-10 region of 
the C. elegans genome. 

Fig. 2A is the nucleotide sequence of egl-20 cDNA 
and the amino acid sequence from the open reading frame, 
25 EGL-10 (SEQ ID NO: 27. ADD SEQ NO for egl-10 CDNA) . 

Fig. 2B shows the positions of egl-20 introns and 
exons and the positions of egl-10 mutations therein. 

Fig. 2C is Northern Blot analysis with egl-10 

cDNA. 

30 Fig. 2D is the sequence of egl-10 mutations. 

Fig. 3A is a diagram of EGL-10 and structurally 
related proteins showing amino acid sequences in 
conserved domains. 
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Fig. 3B shows the sequences of RGS regions of 
homology (SEQ ID NOS: l r 6-14, 28-32, 30-32, and 36-39. 
The RGS-3-4 sequences are isolated from the rest) . 

Fig. 3C is a comparison of the EGL-10 amino acid 
5 sequence and the human RGS 7 sequence (SEQ ID NOS 27 and 
40) . 

Fig. 4 is a photograph of a Northern blot showing 
distribution of egl-10 homolog mRNAs in various rat 
tissues. Fig. 5 shows the partial DNA sequences from 

10 the rat rgs genes, referred to as RGS 5 1-7 sequences (SEQ 
ID NOS: 15-23) . 

_ r . Fig. 6A ..- . 6G show- EGL-10 protein expression. Fig. 

6A shows western blot analysis of protein extracts from 
wild- type and egl-10 (mdl76) worms probed with the 

15 affinity purified anti-EGL-10 polyclonal antibodies. The 
filled arrow indicates the position of the EGL-10 protein 
detected in wild-type but not in egl-20 mutant extracts. 
The open arrow indicates the 47 kD protein that cross- 
reacted with the EGL-10 antibodies but was not a product 

20 of the EGL-10 gene. The positions of molecular weight 
markers are indicated, with their sizes in kD. Fig. 6B 
shows anti-EGL-10 antibody staining of the head of a 
wild-type adult hermaphrodite. The dark immunoperoxidase 
stain labeled the neural processes of the nerve ring 

25 (arrow) . Fig. 6C shows anti-EGL-10 antibody staining of 
the head of an 

egl-10 (mdl76) adult hermaphrodite, prepared in parallel 
to the preparation on Fig. 6B and lacking any specific 
staining. Fig. 6D shows anti-EGL-10 immunofluorescence 

30 staining in the mid-body region of a wild-type adult. 

The fluorescence here and in panels E-G appears white on 
a black background, the reverse of the staining in Fig. 
6B and 6C. The arrow points to the brightly stained 
ventral cord neural processes. Body-wall muscle cells on 

35 either side of the ventral cord contained brightly 
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stained spots arranged in linear arrays. Body-wall 
muscles throughout the animal showed similar staining. 
Fig. 6E shows fluorescence in the head of a transgenic 
adult carrying a fusion of the egl-io promoter and N— 
5 terminal coding sequences to the green fluorescent 

protein (GFP) gene. The fusion protein is localized in 
spots within the body-wall muscles similar to those seen 
in Fig. 6D. GFP fluorescence was also present in neural 
processes and cell bodies out of the plane of focus. 

10 Fig. 6F shows anti-EGL-10 antibody staining in the head 
of a transgenic worm carrying the nlsSl multicopy array 
of wild-type egl~10 genes._ Fig,. 6.G . shows „anti-rEGI,-10 . 
antibody staining in the vulva region of nlsSl worms. 
The open arrow points to the vulva. The large filled 

is arrow indicates the HSN neuron. The small filled arrow 
points to the ventral cord and associated neural cell 
bodies. 

Fig. 7 shows the human rgs2 cDNA sequence (SEQ ID 

NO:41) 

20 I. EGL-10 identifies a new family of heterotrimeric G— 
protein pathway associated proteins which are regulators 
of G-protein signalling fRGS'sl . 

A. Characteristics of egl-io . 

1. Phenotypes conferred by mutation of the eg 1-10 

25 gene. 

The phenotypes conferred by mutations in egl-10 
have been further characterized. As previously described, 
egl-10 loss-of-f unction mutants fail to lay eggs and have 
sluggish locomotory behavior (C. Trent, et al. (1983) 
30 Genetics 104 ; 619-6471 ) . We have now discovered that the 
over express ion of egl-10 produces the opposite effects: 
hyperactive egg-laying and locomotion. More generally, 
we have discovered that the rates of egg-laying and 
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locomotory behaviors are proportional to the number of 
functional copies of egrJ-20. 

The phenotypes conferred by mutations in eg! -20 
are strikingly similar to those conferred by mutations in 
5 goa-2 (J.E. Mendel, et al. (1995) Science 267:1652-5); L. 
Segalat, et al. (1995) Science 267:1648-52). However, 
these phenotypes are reversed relative to the level of 
gene function: mutations of egrl-10 which enhance gene 
function increase the rate of various behaviors whereas 
io those mutations that reduce gene function decrease the 
rates of these behaviors. By contrast, mutations goa— 2 
which reduce fanction increase the rate of behaviors, 
whereas overexpression decreases the rate of the 
behaviors. The occurrence of such a similar constellation 
is of phenotypes strongly suggests that the functions of 
EGL-10 and GOA-1 proteins have related functions, 
components of the same or parallel genetic pathway. Since 
GOA-1 is the nematode homo log of the heterotrimeric G- 
protein, Goto, it is thus likely that EGL-10 plays a role 
20 in one or more heterotrimeric G-protein regulatory 
pathways which contains Gao* 

We have further discovered that loss of function 
mutations in egl-20 confer resistance to drugs that 
effect C. elegans by acting as inhibitors of 
25 acetylcholinesterase (AChE) . Other mutations that confer 
resistance to AChE inhibitors have been shown to reduce 
the synthesis and packaging of the neurotransmitter 
acetylcholine (ACh) or to reduce the function of genes 
that encode proteins that comprise the biochemical 
30 machinery responsible for neurotransmitter release (M. 
Nguyen, A. Alfonso, CD. Johnson and J.B. Rand (1995) 
Genetics 140 : 527-35) . This result indicates that EGL-10, 
and presumably its associated G-protein coupled pathways, 
function to modulate the release of acetylcholine in C. 
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elegans and may be involved in the release of other 
neurotransmitters as well. 

2. The cloning and sequencing of the egl-10 gene. 
egrl-20 had been previously mapped between rol-4 
5 and i.in-25 on chromosome V. Additional mapping, using 
RFLP markers, placed egl-10 within -15Kb of DNA, 
contained entirely on a single cosmid clone (Fig- 1A) - 
Germline transformation with DNA from a subclone from the 
region rescues the phenotype conferred by a mutation that 

20 reduces ogl-20 function. Furthermore, . the rescue is , 

blocked by insertion of a synthetic oligonucleotide which 
interrupts an open reading frame, located entirely within 
the rescuing fragment, with a stop codon (Fig. IB) . The 
open reading thus very likely encodes the EGL-10 protein. 

is The fragment used for transformation rescue was 

used to screen several C. elegans cDNA libraries. The 
longest cDNA obtained (3.2 kb) was sequenced on both 
strands. The cDNA was judged to be full length since it 
contains a sequence matching the C. elegans trans- 

20 spliced-leader SL1 (M. Krause and D. Hirsh (1987) Cell 
4^:753-61). The regions of the genomic clone to which 
this cDNA hybridized were sequenced on one strand. The 
egl-10 genomic structure was deduced by comparing the 
cDNA and genomic sequences. The 3169 nucleotide long 

25 sequence obtained from the cDNA and the 555 amino acid 

long predicted amino acid sequence of the putative EGL-10 
protein are shown in Fig. 2A. The organization of exons 
and introns within genomic DNA are shown in Fig. 2B. 
Northern blot analysis (Fig. 2C) showed the presence of a 

30 single mRNA species at ~3.2kB. 

We sequenced the putative egl-20 genomic cDNA 
obtained from a collection of independently isolated egl- 
10 mutations. Nine mutations induced by chemical 
mutagenesis were shown to contain point mutations within 
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the gene. Six of the mutations created new stop codons 
leading to truncated proteins; the other three mutations 
produced amino acid sequence changes (Fig. 2D) - Five 
spontaneous egl-10 mutations, isolated from a genetically 
5 unstable strain of C. elegans , were shown to contain 
either an insertion of the transposon Tel or a 
rearrangement (Fig. 2D) . Locations of these mutations 
within the gene are shown in Figures 2A and 28. The 
observation that many eg! -20 mutations have detectable 
10 defects, in a putative eg! -20 cDNA is considered proof 
that this cDNA encodes the EGL-10 gene product. 



B . eg! -20 is a member of a new gene family - rgs 

family. 

The eg! -20 gene consists largely of novel 

15 sequences. However, a search of protein sequence 

databases indicated that the gene encodes a 119 amino 
acid domain (Figure 3 A) that is also present in the 
predicted amino acid sequences of two small human genes, 
known as BL34/IR20 and GOS-8 . The functions of BL34/1R20 

20 and GOS-8 were previously completely unknown, and these 
genes were identified only as sequences whose expression 
is increased in B lymphocytes stimulated with phorbol 
esters. In addition, a conceptual gene of unknown 
function, called C05B5.7, identified by the C. elegans 

25 genome sequencing project, also contains this conserved 
domain. Thus, EGL-10 appears to identify a family of 
proteins with multiple members in the same species and 
homologs in related species. By using degenerate probes 
from the conserved domain (in EGL-10, BL34/1R20, GOS-8, 

30 and C0SB5.7) and PGR, we isolated 9 novel sequences that 
contain the conserved domain from rat brain cDNA 
(labelled as rat gene fragments 3 through 11; Fig. 3B) . 
The rat gene fragments isolated using this method are 
called rgss-1 through rgss-9 for regulator G-protein 
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signalling similarity. It appears that there exists a 
substantial number of genes in mammals that are members 
of the rgs family. 

We also observed weak sequence similarities 
5 between portions of the conserved domain in eg\Z — 10 and 
regions of the sst-2 gene of the yeast Saccharomyces 
cerevisiae and the fib A gene is the fungus Aspergillus 
ztidulans. The function of the SST-2 protein appears to 
involve one mode of adaptation in the G-protein pathway 

ao responsible for transduction of the binding of the yeast 
mating factors a and a to their respective 7-TMRs. 
.Evidence, from studies , of the sensitivity, pf yeast Ga to a 
specialized form of proteolysis, suggests that SST-2 
protein may interact directly with Ga. The functions of 

15 FlbA are much less well studied. 

II. Metho ds for identifying new members of the rcrs /eal-10 
gene family . 

The region of homology ve have identified may be 
used to obtain additional members of the RGS family. For 

20 example, sequences from the genes rgss-1 through rgss-9 
were obtained by PCR using degenerate oligonucleotide 
primers designed to encode the amino acid sequences of 
EGL-10, lR20 f and BL34 proteins at the positions 
indicated in Fig. 3B. Two 5' primers pools were used 

25 with two 3' primer pools in all four possible 

combinations. After two rounds of amplification all four 
primer pairs gave a detectable products of -24 0 bp. 
These products were used to prepare clone libraries, 
restriction maps were prepared for selected clones from 

30 each library, clones with different restriction maps were 
divided into classes, and then several clones from each 
restriction map class were sequenced. In total 47 clones 
were sequenced. Each of the nine rgs genes identified by 
this approach was isolated at least twice. As a result, 
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we conclude that it is likely that we have identified 
nearly all the rgs genes that can be amplified from rat 
brain cDNA using these primer pairs. 

At least some of the rgs sequences are expressed 
5 in a wide variety of mammalian tissues, as demonstrated 
by Northern blotting (Fig. 4). Additional G-protein 
signalling genes may be identified by using the same 
primer pairs with cDNA from other rat tissues, with human 
cDNAs or with cDNAs from other species. In addition, 

10 additional rgs genes may be identified using alternate 
primers, based on different amino acid sequences that are 
conserved . not _ only _ in the EGL-10, BL34,. and . ;LR20 proteins , 
but also in the conceptual protein encoded by C05B5.7, in 
SST2 and FlbA and in the proteins encoded by the rgs 

25 genes described herein. 

III. The functional characterization of n ew ras /RGS 
family members 

A» General considerations* 

The function of newly discovered rgs genes can be 
20 determined by analyzing: 

i) the effects of RGS proteins in vivo and in vitro, 

ii) the effects of antibodies specific to RGS proteins, 
or iii) the effects of antisense rgs oligonucleotides 
in well characterized assay systems that measure 

25 functions of mammalian heterotrimeric G-protein coupled 
pathways. Relevant assays for RGS activity include 
systems based on responses of intact cells or cell lines 
to ligands that bind to 7-TMRs, systems based on 
responses of premeabilized cells and cell fragments to 

30 direct or indirect activation of G-proteins and in vitro 
systems that measure biochemical parameters indicative of 
the functioning of G-protein pathway components or an 
interaction between G-protein pathway components. The G- 
protein pathway components whose functions or 
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interactions are to be measured can be produced either 
through the normal expression of endogenous genes, 
through induced expression of endogenous genes, through 
expression of genes introduced, for example, by 
5 trans feet ion with a virus that carries the gene or a cDNA 
for the gene of interest or by microinjection of cDNAs, 
or by the direct addition of proteins (either recombinant 
or purified from a relevant tissue) to an in vitro assay 
system. 

detect and screen new- RGS -genes and -polypeptides. * 

Specific assay systems, including those which are 
relevant to the pathophysiology of human disease and/ or 
are useful for the discovery and characterization of new 
is targets for human therapeutics are as follows: 

1 . Assays based on natural responses of intact 

cells. 

Many mammalian cells, for example cardiac 
myocytes, other smooth muscle cells, neutrophils, mast 

20 cells and other classes of myeloid cells and insulin 

secreting 0 cells of the pancreas have readily detected 
responses mediated by heterotrimeric G-protein dependent 
pathways. To determine if a particular RGS protein is 
involved in such a pathway, one may compare the response 

25 of normal cells to the response which is obtained in 

cells transfected or transiently transformed by the rgs 
gene. Transformation may be done with the RGS cDNA under 
the appropriate promotor or with a construct designed to 
overexpress antisense oligonucleotides to the rgs mRNA. 

30 For example, we could express an rgs gene or 

antisense oligonucleotides to an rgs mRNA in mammalian 
cardiac myocytes as described, for example, by Ramirez et 
al. (M.T. Ramirez, G.R. Post, P.V. Sulakhe and J.H. 
Brown (1995) J. Biol. Chem. 270:8446-51). Cardiac 
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myocytes system respond to a variety of ligands, for 
example a- and /9 -adrenergic agonists and muscarinic 
agonists, by altering membrane conductances, including 
conductances to CI", K + and Ca 2+ . These effects are 
5 mediated by G-proteins through a web of both second 

messenger mediated and membrane delimited effects and are 
readily measured with a variety of well known 
electrophysiological technologies (for example: T.C. 
Hwang, M. Horie, A.C. Nairn and D.C. Gadsby (1992) J • 

10 Gen- Physiol. 99.: 465-89 .) . We would compare the response 
of normal myocytes to cells that overexpress a particular 
rgs gene or . ant is ens e oligonucleotides to a- particular- - 
rgs mRNA. If no difference was observed, we would 
conclude that the particular RGS protein played no 

15 detectable role in cardiac myocyte physiology. on the 
other hand, if alterations in membrane currents were 
observed we would dissect the altered response using 
pharmacology, permeabilized cell systems and reconstitute 
G-protein pathways systems to determine the site of 

20 action of the RGS protein. One may use this system for 
specific screens to identify and test compounds that 
mimic or block the function of the RGS protein. 

2. Assays based on expression of cloned genes In 
particular cells or cell lines* 

25 The involvement of a RGS protein in some known 

functions and interactions between components of 
heterotrimeric G-protein pathways can be efficiently 
assessed in model systems designed for easy and efficient 
overexpress ion of cloned genes. One well developed 

30 system uses COS-7 cells (monkey kidney cells which 

possess the ability to replicate SV-4 0 origin-containing 
plasmids) as a host for the expression of cloned genes 
and cDNAs (D.Q. Wu, C.H. Lee, S.G. Rhee and M.I. Simon 
(1992) J. Biol. Chem. 2j67: 1811-7) . Recently, for example, 



WO 96/38462 



PCT/US96708295 



- 31 - 

overexpression of G-protein pathway genes in COS-7 cells 
was used to determine the capability of two forms of 
interleukin-8 receptor to activate the 5 different Gcr 
subunits of the Gg family by measuring subsequent effects 
5 on the activity of two alternate types of PI-PLC0, 

measured by quantified the formation of [H 3 ]IP3 in cells 
prelabelled with radioactive inositol (D. Wu, G.J. LaRosa 
and M.I. Simon (1993) Science 262:101-3). Similarly co- 
expression in COS-7 cells has been used to quant itate the 
10 effects of proteins that inhibit signalling by activated 
G-proteins (W.J. Koch- B.E. Haves. j s Inglese, L,M, 
Luttrell and. R,J.. Lef kowifc^- . (vl9,94 )- J . . Biol . . Chem- 
269 :6193-7^ . 

A useful alternative to cells lines, more amenable 

15 to the study of membrane delimited activation of ion 
channels involves the transient production of proteins 
following injection of mRNAs into Xenopus oocytes (E. 
Reuveny, P. A. Slesinger, J. Inglese, J.M. Morales, J. A. 
Iniguez-Lluhi, R.J. Lefkowitz, H. A. Bourne, Y.N. Jan and 

20 L.Y. Jan (1994) Nature 370:143-6). For example, the 
coexpression of two. 7-TMRs (serotonin type 1C receptor 
and thyrotropin releasing hormone receptor) may be 
coupled with overexpression of one of seven alternate Ga 
subunits and with one of two alternate PI-PLC0S or 

25 adenylyl cyclase and the cystic fibrosis transmembrane 
conductance regulator (CFTR) (M.W. Quick, M.I. Simon, N. 
Davidson, H.A. Lester and A.M. Aragay (1994) J. Biol. 
Chem. 269:30164-72). Combined with expression of 
antisense oligonucleotides designed to block endogenous 

30 pathways, these systems can be engineered to measure 
specific interactions between 7-TMRS, G subunits, 
effectors, various inhibitors as well as components 
controlled by effectors. To determine the effect of an 
RGS protein one may compare the effect in transfected 

35 COS-7 cells or Xenopus oocytes with and without 
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cotransfection with the rgs gene or cDNA, one may also 
transfect an rgs gene construct designed to overexpress 
antisense oligonucleotides to endogenous rgs mRNAs. 

If a RGS protein-dependent alteration of a G- 
5 protein dependent response is observed, one may utilize 
pharmacological tools and reconstitute G-protein pathways 
systems to determine the site of action of the RGS 
protein. From these experiments, a specific screen for 
identifying and testing compounds that mimic or block the 
10 function of the RGS protein may be developed. 

3. Assays utilizing premeabillzed cells . .. _ ,.,.-„... 
The role of RGS proteins in intracellular events 
such as membrane trafficking or secretion can be studied 
in systems utilizing permeabilized cells, such as mast 

is cells (T.H. Li Hie and B.D. Gomperts (1993) Biochem- J. 
290:389-94), chromaffin cells of the adrenal medulla (N- 
Vitale, D . Aunis and M.F. Bader (1994) Cell. Mol. Biol. 
£0:707-15) or more highly purified systems derived from 
these cells (J.S. Walent, B.W. Porter and T.F.J. Martin 

20 (1992) Cell 70:765-775). The determine the effects of RGS 
proteins one may compare the extent and kinetics of GTP 
or yS-GTP induced secretion in the presence and absence 
of excess RGS protein or antibodies specific to RGS 
proteins. 

25 If an RGS protein-dependent alteration of membrane 

trafficking or secretion is observed, further experiments 
may be used to explore the specificity and generality of 
this action and to determine the precise site of action 
of the RGS protein. From these experiments, a specific 

30 screen for identifying and testing compounds that mimic 
or block the function of the RGS protein can be 
constructed. 
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4. Assays utiliziflgr reconstituted G-protein 
pathways. 

The ability to assess specific protein-protein 
interactions between specific components that function 
5 within G-protein pathways may be employed to assign RGS 
functions. These assays generally use recombinant 
proteins purified from an efficient expression systems, 
most commonly, i) insect Sf9 cells infected with 
recombinant baculovirus or ii) E, coll. Specific 

10 interactions which form part of G-protein pathways are 
then reconstituted with purified or partially purified 
proteins. The .effects of - RGS -proteins on -such systems -can- 
be easily assessed by comparing assays in the presence 
and absence of excess RGS protein or antibodies specific 

15 to RGS proteins. From these experiments, specific 

screens for identifying and testing compounds that mimic 
or block the function of the RGS protein can be 
developed. 

Uses 

20 RGS DNA, polypeptides, and antibodies have many 

uses. The following are examples and are not meant to be 
limiting. The RGS encoding DNA and RGS polypeptides may 
be used to regulate G-protein signalling and to screen 
for compounds which regulate G-protein signalling. For 

25 example, RGS polypeptides which increase secretion may be 
used industrially to increase the secretion into the 
media of commercially useful polypeptides. Once proteins 
are secreted, they may be more readily harvested. One 
method of increasing such secretion involves the 

30 construction of a transformed host cell which synthesizes 
both the RGS polypeptide and the commercially important 
protein to be secreted (e.g, TP A) . RGS proteins, DNA, 
and antibodies may also be used in the diagnosis and 
treatment of disease. For example, regulation of G- 
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protein signalling may be used to improve the outcome of 
patients with a wide variety of G-protein related 
diseases and disorders including, but not limited to: 
diabetes, hyperplasia, psychiatric disorders, 
s cardiovascular disease, McCune -Albright Syndrome, and 
Albright hereditary osteopathy. 

IV, Deposit Information- 

Genebank accession numbers for the sequences 

provided herein are as follows: The worm sequence, egrl- 
10 20; has number U32326. The rgs sequence fragments 

. isolated, from- the , rat ^as follows:- rgs 5^ U32434; rgsly - 

U32327; rgs6 , U32435; rgs7, U32436; rat rgs2 , U32328; 

rgs3, U32432; rgs4 , U32433; rgs8 , U32437; rgsfi, U32438. 

Accession numbers for representative expressed sequence 
15 tags from human rgs genes are: RGS-1, R12757 , F07186; 

RGS 6 , D31257, R35272; RGS10, R35472, T57943; RGS 13 , 

T94013; RGS11, R11933; RGS12, T92100. The human RS7 

accession number is 442439. 

V . Examples ♦ 

20 A. Characteristics of egl-20. 

2. Nematode strains. 

Nematode strains were maintained and grown at 20°C 
as described by Brenner (Brenner, (1974) Genetics 
77:71-94) . Genetic nomenclature follows standard 

25 conventions (Horvitz et al. , (1979) Hoi. Gen. Genet. 
17.5:129-33). The following mutations were used: 
goa-1 (n363 , nll34) (Segalat et al., (1995) Science 
267:1648-51), arDfl (Tuck and Greenwald, (1995) Genes & 
Development 9:341-57), egl-10 alleles (Trent et al. , 

30 (1983) Genetics 104 :619-47^ : Desai and Horvitz, (1989) 
Genetics 121 :703-21) . nJsSl (this work), nXs67 (this 
work) . We also used the following marker mutations, 
described by Wood (Wood, ed. (1988) Cold Spring Harbor, 
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New York: Cold Spring Harbor Laboratory) ; (LG I), 
unc-13 (el091) ; (LGV) , unc-42 (e270) , lin-25 (nS4S) , 
him-5(&1467) ; (LGX) , 2in-25 (n7£5; . 

2, The genetic map position of egl-10. 
5 egl-10 had previously been mapped between ro2-4 

and lin-25 on chromosome V (Trent et al. , (1983) Genetics 
.104:619-647; Desai and Horvitz , (1989) Genetics 
JL21: 703-21) . We characterized four Tel transposon 
insertions found in this interval in the Bergerac strain 

10 of C. elegans, but not in the standard Bristol (N2) 

strain: nP63 f nP64 , *rP4 and ajrPS (first identified by 
Tuck and -GreenwaloV- { (19-95) -Gene s-Jt- Development - 
9:341-57). From heterozygotes of the genotype 
egl-10 (ii692) /rol-4 (sc8) nP63 nP64 arP4 arP5 lin-25 (n545) 

15 him-5(e!467) , Rol non-Lin recombinants were selected. 
Strains homozygous for the recombinant chromosomes were 
assayed for the Egl-10 phenotypes (sluggish movement and 
defective egg-laying) , and for the presence of each of 
the transposons by probing Southern blots of genomic DNA 

20 with appropriate genomic clones. Nine recombination 
breakpoints were thus found to distribute as follows: 
rol-4 (2/9) nP63 (0/9) nP64 (1/9) egl-20 (1/9) arP4 (1/9) 
arP5 (4/9) 2in-25. These data place the egl-10 gene in 
the interval between nP64 and arP4 (Figure 1A) . 

25 3. goa-l; eg2-20 douJble mutants. 

goa-l; egl-10 strains were constructed by using 
the UJ3C-23 fe2092; mutation, which lies within 80 kb of 
the goa-l gene (Maruyama and Brenner, (1991) Proc. 
Nat'l. Acad. Sci. USA 88:5729—33) , to balance the goa-l 

30 mutations. unc-23/+; egl-20/+ males were mated to goa-l 
hermaphrodites and hermaphrodite cross progeny were 
placed individually on separate plates. unc-13 /goa-l; 
egl-20/+ animals were recognized as segregating 1/4 Unc 
(uncoordinated) and -1/4 Egl (egg-laying defective) 

3 5 progeny. Among these progeny, Egl non-Unc animals were 
picked to separate plates, and were judged to be of 
genotype goa-2/unc-23 ; g2-20 if they segr gated 1/4 Unc 
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and >3/4 Egl progeny. Non-Unc progeny were picked 
individually to separate plates, and goa-1; egl-20 
animals were recognized as never segregating Unc progeny. 
The following double mutant strains were constructed: 
5 MT8589 goa-1 (nll34) ; egl-10 (n990) , MT8593 goa-1 (n363) ; 
egl-10 (n990) , MT8641 goa-2 (n363 J ; egl-10 (n944) , MT8587 
goa-l(nl!34) ; egl-10 (n944 ) , goa-l(n363) / egl-10 (md!7 6) . 

Animals with reduction of function mutations in 
both goa-1 and egl-10 display a behavioral phenotype that 
10 is very similar to that of strains with mutations in goa- 
l alone, i.e. the animals have hyperactive locomotion and 
precocious egg-laying. This observation implies that 
EGL-10 protein acts either before or, at. the same step, in 
the G-protein regulatory pathway as the GOA protein, Goto. 



15 4 . Germline transformation and chromosoma.1 

Integration of egl-10 transgenes. 

Germline transformation (Mello et al. , (1991) 
Embo. J . 10:3959-70) was performed by coinjecting the 
experimental DNA (80 /xg/ml) and the lin-15 rescuing 

20 plasmid pLlSEK (Clark et al. , (1994) Genetics 137 . 

987-97) into animals carrying the lin-15 (n765) marker 
mutation. Transgenic animals typically carry coinjected 
DNAs as semistable extrachromosomal arrays (Mello et al . , 
(1991) Embo. J. 10:3959-70) and are identified by rescue 

25 of the temperature sensitive multivulva phenotype 

conferred by the lin-15 (n7S5) mutation. For egl-10 
rescue experiments, animals of the genotype egl-10 (n692 ) ; 
lin-15 (n7€5) were injected, and transgenic lines were 
considered rescued if >90% of the non-mult i vulva animals 

30 did not show the egg laying defective phenotype conferred 
by the egl-10 (n692) mutation. Plasmid pMK120 contains a 
15 kb Smal-Fspl fragment of cosmid W08H11, containing the 
entire egl-10 gene, into which the self -annealed 
oligonucleotide 5 ' -GTGCTAGCACTGCA-3 ' (SEQ ID NO: 35) was 

35 inserted at the unique PstI site, thus disrupting the 

open reading frame of the fourth egi-10 exon. pWJC121 was 
generated by digesting pMK120 with PstI and ligating, 
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thus precisely removing the oligonucleotide and restoring 
the egl-10 open reading frame. egl-10 was rescued in all 
13 transgenic lines carrying pMK121 that were generated, 
while 0/17 pMK120 lines showed egl-10 rescue of even a 
5 single animal (Fig- IB) . 

5. egl-20 cDNAs and the genomic structure. 

An 8.5 kb Apal-MscI fragment, encompassing the 
middle half of the egl-10 rescuing genomic clone pMK120, 
was used to screen 3.7X10 6 plaques from four different C. 

10 elegans cDNA libraries (Bar stead and Water ston, (1989) 
J. Bi.ol - chem= 2 64 1 10177—35 ? Maruyama and Brenner, (1992) 
Gene 120 :13 5-rAl . ; ..Okkema .and .Fire, (1994). Development 
120:2175-86.)- Thirteen egl-10 cDNAs were isolated, the 
longest of which was 3.2 kb. This cDNA was completely 

is sequenced on both strands using an ABI 37 3 A DNA sequencer 
(Applied Biosystems, Inc.). The sequence data was 
compiled on a Sun workstation running software as 
described by Dear and Staden (Dear and Staden, (1991) 
Nucleic Acids Research 19:3907-11) and displayed in Fig. 

20 2A. The regions of the pMK120 genomic clone to which this 
cDNA hybridized were also sequenced on one strand, and 
the egl-10 genomic structure was deduced by comparing the 
cDNA and genomic sequences (Fig. 2B) . The 3.2 kb cDNA 
was judged to be full length since it contains a sequence 

25 matching the C. elegans trans-spliced leader SLl (Krause 
and Hirsh, (1987) Cell 49: 753-61) at its 5' end, a 
poly (A) tract at its 3' end (although it lacks a 
consensus poly (A) addition signal) , and matches the 
length of the 3.2 kb RNA detected by Northern 

30 hybridization (Figure 2C) . Other cDNAs were shorter but 
colinear with the 3.2 kb cDNA clone as judged by 
restriction mapping and end sequencing. 

6. egl-10 mutant DNAs . 

egl-10 genomic DNA was PCR amplified from egl-10 
35 mutants in -1 kb sections using primers designed from the 
egl-io genomic sequence. The PCR products were 
electrophores d on agarose gels, and the excised PCR 



WO 96/38462 PCT/US96/08295 

- 38 - 

fragments were purified from the agarose by treatment 
with 0-agarase (New England Biolabs) and isopropanol 
precipitation. The purified PCR products were directly 
sequenced using the primers that were used to amplify 
5 them, as well as primers that annealed to internal sites. 
Any differences from the wild- type sequence were 
confirmed by reamplif ication and resequencing of the site 
in question. In this way the entire egl-10 coding 
sequence as well as sequence 2 0 bp into each egl-10 

ao intron was determined for each of ten ethyl 

methanesulphonate (EMS) -induced eg! -20 alleles (Trent et 
al-, (1983) Genetics 104:619-647; Desai and Horvitz , 
_(1989J .Genetics 12.1 :7Q3-21).,. .as^ w,ell.,.as f or. the. _ .... 
spontaneous allele mdlOOS . The alterations discovered are 

is listed in Fig. 2D. One EMS-induced egl-20 allele, n953 , 
appeared to contain no alterations from wild type in the 
region sequenced, but may contain alterations in other 
parts of the gene. mdl006 contains no sequence 
alterations from wild type other than the insertion of a 

20 Tel transposon at codon 515. 

Genomic DNA from each of five spontaneous egl-10 
alleles was analyzed by Southern blotting and probing 
with clones spanning the egl-20 gene. md!006 contains a 
1.6 kb insert relative to wild type which was shown to be 

25 a Tel transposon insertion by PCR amplification using 
primers that anneal to the Tel ends with primers that 
anneal to egi-20 sequences flanking the insertion site, 
and by further sequencing these PCR products. The four 
other spontaneous alleles each contain multiple 

30 restriction map abnormalities spanning the entire eg! -20 
locus, and each failed to give PCR amplification products 
using one or more primer pairs from the egl-10 gene. 
None of these alleles appear to be due to a simple 
insertion or deletion, and we suspect more complex 

35 rearrangements may have occurred. 

7. Localization of EGL-10 protein in neural 
processes and subcellular regions of body wall muscle 
cells. 
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We raised polyclonal antibodies against 
recombinant EGL-10 protein. When affinity-purified, 
these antibodies recognized two major proteins on western 
blots of total C. elegans proteins (Fig. 6A) . The larger 
5 of these proteins is the product of the egI-20 gene, 

since this protein was absent from extracts of the egl-10 
null mutant mdll76 (Fig. 6A) , as well as from extracts of 
12 other egl-10 mutants. This larger protein was 
detected at a reduced abundance in the weak eg 1-10 mutant 
o ii480 and was present at normal abundance in egl-10 (nll25) 
animals, which carry a missense mutation that alters 
amino acid 44 6. The 47 kD protein recognised by the 
anti~EGL-lO- antibodies- is not affected by egl-10 
mutations and thus is not encoded by the egl-10 gene 
is (Fig. 6A) . 

We stained wild-type and egl-10 mutant worms with 
the affinity -purified anti-EGL-10 antibodies. We 
observed staining in the nerve ring (Fig. 6B) , ventral 
nerve cord (Fig. 6D) , and dorsal nerve cord (not shown) 
20 of wild-type animals, but saw no neural staining in egl- 
10 mutants (Fig. 6C) . The stained structures consisted 
of bundles of neural processed and were at the locations 
of the majority of the chemical synapses in the animal 
(White et al., Phil. Trans. R. Soc. Lond. B 314:1-340, 
25 1986). In neurons EGL-10 protein appeared to be 

localized exclusively to processes; no staining was seen 
in the neural cell bodies of wild-type animals. Animals 
at all stages of development from first-stage larvae to 
adults showed similar staining of neural processes. The 
30 localization of EGL-10 protein to structures in which 

chemical synapses are made is consistent with a role for 
EGL-10 in intercellular signalling. 

We also used the EGL-10 antibodies to stain worms 
that overexpress EGL-10 from a multicopy array of egl-10 
35 transgenes (Figs. 6F, 6G) . EGL-10 was detected in neural 
cell bodies as well as neural processes of these animals, 
either because overexpress ion raised the level of EGL-10 
protein in cell bodies above the threshold of detection 
or because ov rexpression of EGL-10 exceeded the capacity 
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of neurons to localize the protein to processes. Figure. 
6F shows that a large number of neurons in the major 
ganglia of the head region expressed EGL-10. In 
addition, our examination of the ventral cord neurons, 
5 lateral neurons, and tail ganglia suggested that most if 
not all neurons in C. elegans expressed EGL-10. In 
particular, the HSN motor neurons, which control egg- 
laying behavior and appear to be functionally defective 
in egl-10 mutants, expressed EGL-10 (Fig. 6F) . 
10 A second staining pattern present in wild- type 

animals, consisted of spots arranged in linear arrays 
within the body-wall muscle cells (Fig. 6D) . Although 
this staining, was, n.ot.. absent- from egI-10 null mutants we . 
nevertheless believe that the EGL-10 protein is localized 
15 to these muscle structures, since the muscle stain was 
more intense in EGL-10 over expressing animals and was 
reproduced by egl-10 ; :gfp transgenes (see below). The 
residual antibody stain seen in the muscles of eg! -10 
mutants may have been caused by the presence of a cross- 
20 reactive protein (perhaps the 45 kD protein detected in 
our western blots) that is colocalized with EGL-10. The 
body-wall muscles are used in locomotion behavior (Wood 
et al., The Nematode Caenorhabdltis eleaans . Cold Spring 
Harbor, New York, Cold Spring Harbor Laboratory Press, 
25 1988), the frequency of which is controlled by egl-10. 
Every body wall muscle cell stained, but no staining was 
detected in other types of muscle cells, even in animals 
over expressing EGL-10. The body-wall muscle stain 
superimposed on structures visible in Nomarski optics 
30 called dense bodies, which function as attachment sites 
between the body-wall muscles and the cuticle that 
surrounds them (Wood et al. , supra). Each dense body is 
flanked by membranes of the sarcoplasmic reticulum, and 
our observations at the light microscope level cannot 
35 distinguish between localization of the stain to the 
d ense bodies or to the sarcoplasmic reticulum. The 
significance of the localization of EGL-10 to these 
structures is unclear. 
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Transgenic animals carrying fusions of the egl-10 
promoter and N-terminal coding sequences to the 
fluorescent reporter protein GFP (Chalfie et al., Science 
263:802-805, 1994) showed GFP fluorescence in body-wall 
5 muscle cells in the same pattern seen in animals stained 
with the EGL-10 antibody (Fig. 6E) . These experiments 
demonstrated that the N-terminal 122 amino acids of EGL- 
10, when fused to GFP, were sufficient to localize the 
fusion protein to the dense body-sarcoplasmic reticulum- 

10 like structures. The EGL-10:: GFP fusion proteins were 
also expressed in neurons but, like overexpressed full- 
length EGL-10 protein, were not tightly localized to 

. .^processes,. , preventing , us.. from -identifying-,. the regions .of 
EGL-10 responsible for localization of EGL-10 to neural 

15 process. 

8. EGL-10 is similar to Sst2p, a negative 
regulator of G protein signalling in yeast* 

The 555 amino acid EGL-10 protein contains a 120- 
amino acid region near its carboxy-terminus with 

20 similarity to several proteins in the sequence databases 
(Fig. 3A) . The similarities with the C. elegans C05B5.7 
protein and the BL34/1R20 and G0S8 proteins extend across 
the entire 120-amino acid region; this region is 34-55% 
identical in pairwise comparisons among EGL-10 and these 

25 other proteins. An additional c. elegans protein, 

C29H12.3, consists almost entirely of two highly diverged 
repeats of this domain. The first 4 3 and last 29 amino 
acids of the conserved 120-amino acid region are similar 
to sequences found in the yeast protein Sst2P and the 

30 Aspergillus nidulans protein FlbA. Sst2p and FlbA are 

30% identical to each other over their entire lengths and 
show higher conservation in several short regions (Fig. 
3A) ; it is two of these more highly conserved regions 
that show similarity to the conserved domain found in 

35 EGL-10, C05B5.7, BL34/IR20, G0S8 and C29H12.3. 

Alignments of all of these conserved sequences are shown 
in Fig. 3B. This figure also shows alignments with the 
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sequences of nine additional mammalian EGL-10 protein 
homo logs whose isolation is described below. 

The similarity of EGL-10 to Sst2p is of particular 
interest, since Sst2p functions as a regulator of the G 
5 protein-mediated pheromone response pathway in yeast 

(reviewed by Sprague and Thorner, Cold Spring Harbor, New 
York, Cold Spring Harbor Laboratory Press, pp. 657-744, 
1992; and Kurjan, J., Annu. Rev. Genet. 27:147-179, 
1993) . We concluded from this that EGL-10 and Sst2p are 
10 members of an evolutionary conserved family of regulators 
of G protein signalling. 

Little has been previously known about the 
functions, of .the. other. genes that have .sequence . 
similarity to egl-10. flbA mutants of Aspergillus 
15 nidulans are defective in the development of 

conidiopnores , specialized spore-bearing structures (Lee 
and Adams, Mol. Microbiol. 14:323-334, 1994). The 
C05B5.7 and C29H12.3 genes were identified by the C. 
elegans genome sequencing project (Wilson et al. , supra). 
20 BL34/IR2 0 is a human gene expressed specifically in 
activated B lymphocytes (Murphy and Norton, Biochem. 
Biophys. Acta 1049:261-271, 1990; Hong et al. , J. Immun. 
150:3895-3904, 1993; Newton et al. , Biochim. Biophys. 
Acta 1216:314-316, 1993) . gos8 is a human gene was 
25 identified by a clone from a blood monocyte cDNA library 
(Siderovski et al., DNA Cell. Biol. 13:125-147, 1994). 

B . rgs genes: Mammalian homologs of egl-10. 
1 . Isolation of rgs genes . 

Degenerate oligonucleotide primers were designed 
30 to encode the amino acid sequences of the EGL-10, 

1R20/BL34 and GOS8 proteins at the positions indicated in 
Figure 3B. Two 5' primers pools were used with two 3' 
primer pools in all four possible combinations. The 
primers contained the base inosine (I) at certain 
35 positions to allow promiscuous base pairing. 
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The 5' primers were: 
5E: G(G/A)IGA(G/A)AA(T/C) (A/T/C) TIGA (G/A) TT (T/C) TGG (SEQ 
ID NO: 2) ; 

5R: G (G/A) IGA(G/A) AA (T/C) (A/T/C) TI (A/C) GITT (T/C) TGG (SEQ 
5 ID NO 3) . 

The 3' primers were: 
3T: G (G/A) TAIGA (G/A) T(T/C) ITT (T/C) T (T/C) CAT (SEQ ID NO 

4; 

3A: G (G/A) TA(G/A) CT (G/A) T (T/C) ITT (T/C) T (T/C) CAT (SEQ ID 
10 NO 5). 



. _ . . .Amplification, conditions ..were opt imized„ by. using 

C. elegans genomic DNA as a template and varying the 
annealing temperature while holding all other conditions 
fixed. Conditions were thus chosen which amplified the 

as egl-10 gene efficiently while allowing the amplification 
of only a small number of other c. elegans genomic 
sequences. Amplification reactions for rat brain cDNA 
were carried out in 50 pi containing 10 mM Tris-HCl (pH 
8.3), 50 mM KC1, 1.5 mM MgCl2 , 0.001% gelatin, 200 fiM. 

20 each of dATP, dCTP, dGTP, and dTTP, 1 U Tag polymerase, 2 
MM each PCR primer pool, and 1.5 ng rat brain cDNA as a 
template (purchased from Clonetech) . The optimized 
reaction conditions were as follows: initial denaturation 
at 95°C for 3 min. , followed by 4 0 cycles of 4 0°C for 1 

25 min., 72°C for 2 min., 94°C for 45 sec, and a final 
incubation of 72°C for 5 min. After this initial 
amplification some primer pairs gave detectable products 
of -240 bp. 2 pi of each initial amplification reaction 
was used as a template for further 40 cycle amplification 

30 reactions under the same conditions; all primer pairs 
gave a detectable -24 0 bp product after the second round 
of amplification. The -24 0 bp PCR products were subcloned 
into EcoRV cut pBluescript (Stratagene) treated with Taq 
polymerase and dTTP, generating clone libraries for 

35 amplifications from each of the four primer pairs. Clones 
from each library were analyzed as follows: after 
digestion with the enzymes Stu I, Bgl II, Sty I, Nco I, 
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Pst I , and PpuM I , clones were divided into classes with 
different restriction maps and several clones from each 
restriction map class were sequenced using an ABI 37 3A 
DNA sequencer (Applied Biosystems, Inc.)- A total of 121 
5 clones were restriction mapped, of which 47 were 
sequenced. 

With this approach, we identified nine genes, 
called rgss-1 through rgss-9 for regulator G-protein 
signalling similarity genes from rat brain cDNA. Their 

10 DNA sequences are displayed in Fig. 3B and their amino 
acid sequences in Figure 3B (labelled as rat gene 
fragments 3 through 11, SEQ ID NOS 15-23). Each of the 
rat rgs rragments was isolated. at least twice. Three of . 
the four primer pairs used identified a gene that was not 

15 identified by any of the other primer pairs. Thus we 

appear to have identified all or nearly all the rgs genes 
that can be amplified from rat brain cDNA using these 
primer pairs. 

C. Human rgs genes. 
20 We identified additional human genes encoding RGS 

domains by searching a database of expressed sequence 
tags. This search identified matches to five previously 
defined genes (including BL34/IR20 and GOS-8) and 
apparent human orthologs of the rat rgsl, rgs6 , and rgs2 
25 genes — as well as partial sequences of four new genes, 
which we have named RGS 12 through RGS15. 

Human RGS2 shares sequence similarity with EGL-10 
outside of the RGS domain, unlike other RGS domain 
proteins for which extended sequences are available. We 
30 therefore obtained and determined the sequence of a human 
rgs2 cDNA (Fig. 7, SEQ ID NO:41). While incomplete at 
its 5' end, this 1.9 kb cDNA contains a 420-codon open 
reading frame that encodes a protein with similarity to 
EGL-10 throughout its length (Figure 3C; SEQ ID NO: 40). 
3 5 The predicted RGS 2 protein is 53% identical to EGL-10, 

with the highest conservation (75% identity) occurring in 
the N-terminal 174 amino acids of the human RGS 2 
sequence. The 119-amino acid RGS domain of human RGS2 , 



4 
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by contrast, is 4 6% identical to the corresponding C- 
terminal region of EGL-10. EGL-10 contains a 79 amino 
acid serine/ alanine rich insertion relative to human RGS2 
between these conserved amino* and C -terminal regions. 
5 The conserved N-terminal region of EGL-10 functions to 
localize the protein within muscle cells, and the 
corresponding region of RGS2 may play a similar role for 
human RGS2 intracellular localization. It is possible 
that RGS is the human protein most similar to EGL-10. As 
10 a result, human RGS2 is likely to play a functional role 
analogous to EGL-10 in regulating signaling by G 0 * 

„ I .char&cterJ.zation .of , rat. r.gs .genejs w . . - 

Southern blots of rat genomic DNA were probed at 
high stringency with labelled subclones for each of the 
15 nine rgs gene PCR fragments. Each probe detected at 
least one different genomic EcoRI fragment and gave 
signals of comparable intensity, suggesting that the each 
rgs PCR product is derived from a single copy gene in the 
rat genome. 

20 Labelled rgs gene probes were serially hybridized 

to a Northern blot (purchased from Clonetech) bearing 2 

of poly (A) + RNA from each of various rat tissues 
(allowing time for the radioactive signals to decay 
between probings) . A human £-actin cDNA probe was used 

2 5 to control for loading of RNA. The results indicate that 
rgs genes are widely and differentially expressed in rat 
tissues (Figure 4) . This result implies additional rgs 
genes could be identified by using the same primer pairs 
with cDNA from other rat tissues, with human cDNAs or 

30 with cDNAs from other species. In addition, it is very 
likely that additional rgs genes could be identified 
using alternate primers, based on different amino acid 
sequences that are conserved not only in the EGL-10, 
BL34/1R20, and GOS8 proteins, but also in the conceptual 

35 protein encoded by C05B5.7, the SST2 and FlbA proteins 
and in the proteins encoded by the rgs genes identified 
so far. 
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What is claimed is: 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Massachusetts Institute of Technology 
(ii) TITLE OP INVENTION: REGULATORS OF G— PROTEIN SIGNALLING 
(iii) NUMBER OF SEQUENCES: 41 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & RichardBon P.C. 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

(D) STATE: MA 

(E) COUNTRY: DSA 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS - . 

(D) SOFTWARE : Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US96 / 

(B) FILING DATE: 31-MAY-1996 

(C) CLASSIFICATION: 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/588,258 

(B) FILING DATE: 12-JAN-96 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Bieker-Brady , Kristina 

(B) REGISTRATION NUMBER: 39,109 

<C) REFERENCE / DOCKET NUMBER: 01997/216001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 617/542-5070 

(B) TELEFAX: 617/542-8906 
<C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 123 amino acids 

(B) TYPE: amino acid 

(C) STRAND ED NESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Leu Trp Glu Asp Ser Phe Glu Glu Leu Leu Ala Asp Ser Ser Leu Gly 

15 10 15 

Arg Glu Thr Leu Gin Lys Phe Leu Asp Lya Glu Tyr Ser Gly Glu Asn 

20 25 30 

Leu Arg Phe Trp Trp Glu Val Gin Lys Leu Leu Arg Lys Cys Ser Ser 

35 40 45 
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Arg Arg Met Val Pro Val Met Val Thr Glu lie Tyr Asn Glu Phe lie 
50 55 60 

Asp Thr Asn Ala Ala Thr Ser Pro Val Asn Val Asp Cys Lys Val Met 
65 70 75 80 

Glu Val Thr Glu Asp Asn Leu Lys Asn Pro Asn Arg Trp Ser Phe Asp 
85 90 95 

Glu Ala Ala Asp His lie Tyr Cys Leu Met Lys Asn Asp Ser Tyr Gin 
100 105 110 

Arg Phe Leu Arg Ser Glu lie Tyr Lys Asp Leu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( ix ) FEATURE : 

(A) NAME /KEY: Modified- site 

(D) OTHER INFORMATION: N is Inosine. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GNNGANAARY TNGANTTRTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME /KEY : Modif ied-site 

(D) OTHER INFORMATION: N is Inosine. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GNNGANAARY TNSGTTRTGG 20 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( ix ) FEATURE : 

(A) NAME /KEY: Modif ied-site 

(D) OTHER INFORMATION: N is Inosine. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GNTANGANTR NTTRTRCAT 19 

(2) INFORMATION FOR SEQ ID NO: 5: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( Lx ) FEATURE : 

(A) NAME /KEY: Modified- site 

(D) OTHER INFORMATION; N is Inosine. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GNTANCTNTR NTTRTRCAT 

(2) INFORMATION FOR SEQ ID NO: 6: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6*7 amino acids 

(C) STRANDEDKEiiS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

lie Ser Cys Glu Glu Tyr Lys Lys lie Lys Ser Pro Ser Lys Leu Ser 
1 5 10 IS 

Pro Lye Ala Lys Lys lie Tyr Asn Glu Phe lie Ser Val Gin Ala Thr 
20 25 30 

Lys Glu Val Asn Leu Asp Ser Cys Thr Arg Glu Glu Thr Ser Arg Asn 
35 40 45 

Met Leu Glu Pro Thr lie Thr Cys Phe Asp Glu Ala Gin Lye Lys lie 
50 55 60 

Phe Asn Leu 



(2) INFORMATION FOR SEQ ID NO:7t 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 
(8) TYPE: amino acid 

(C) STRAKDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Leu Ala Val Glu Asp Leu Lye Lys Arg Pro lie Arg Glu Val Pro Ser 
1 5 10 15 

Arg Val Gin Glu lie Trp Gin Glu Phe Leu Ala Pro Gly Thr Pro Ser 
20 25 30 

Ala lie Asn Leu Asp Ser Lys Ser Tyr Asp Lys Thr Thr Gin Asn Val 
35 40 45 
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Lys Glu Pro Gly Arg Tyr Thr Phe Glu Asp Ala Gin Glu His He Tyr 
SO 55 60 

Lys Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 67 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Leu Ala Cys Glu Glu Phe Lyc-Xya Thr Arg Ser Thr Ala Lys Lsu Val 

■ 1 -5 - 10 - - - IS - 

Thr Lys Ala His Arg He Phe Glu Glu Phe Val Asp Val Asp Ala Pro 
20 25 30 

Arg Glu Val Asn He Asp Phe Gin Thr Arg Glu Ala Thr Arg Lys Asn 
35 40 45 

Met Gin Glu Pro Ser Leu Thr Cys Phe Asp Gin Ala Gin Gly Lys Val 
50 55 60 

His Ser Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Glu Ala Cys Glu Asp Leu Lys Tyr Gly Asp Gin Ser Lys Val Lys Glu 
15 10 15 

Lys Ala Glu Glu lie Tyr Lys Leu Phe Leu Ala Pro Gly Ala Arg Arg 
20 25 30 

Trp He Asn He Asp Gly Lys Thr Met Asp He Thr Val Lys Gly Leu 
35 40 45 

Arg His Pro His Arg Tyr Val Leu Asp Ala Ala Gin Thr His He Tyr 
50 55 60 



Met Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 10: 

<i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 68 amino acids 

(B) TYPE: amino acid 



WO 96/38462 



PCT/US96/08295 



- 51 - 

(C) STRAND ED NESS t not relevant 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 10: 

Leu Ala Cys Glu Asp Phe Lys Lys Val Lys Ser Gin Ser Lys Met Ala 
15 10 15 

Ala Lys Ala Lys Lys He Phe Ala Glu Phe He Ala He Gin Ala Cys 
20 25 30 

Lye Glu Val Ann Leu Asp Ser Tyr Thr Arg Glu His Thr Lys Glu Aen 
35 40 45 

Leu Gin Ser He Thr Arg Gly Cys Phe Asp Leu Ala Gin Lys Arg He 
50 55 5Q 

Phe Phe Gly Leu . ... 

65 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS 2 not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Val Ala Cys Glu Aen Tyr Lys Lys He Lys Ser Pro He Lys Met Ala 
15 10 15 

Glu Lys Ala Lys Gin Gin He Tyr Glu Glu Phe He Gin Thr Glu Ala 
20 25 30 

Pro Lys Glu Val Asn He Asp His Phe Thr Lys Asp He Thr Met Lys 
35 40 45 

Asn Leu Val Glu Pro Ser Pro His Ser Phe Asp Leu Ala Gin Lys Arg 
50 55 60 

He Tyr Ala Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
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Leu Ala Val Gin Asp Leu Lys Lye Gin Pro Leu Gin Asp Val Ala Lys 
15 10 15 

Arg val Glu Glu lie Trp Gin Glu Phe Leu Ala Pro Gly Ala Pro Ser 
20 25 30 

Ala lie Aen Leu Asp Ser His Ser Tyr Glu lie Thr Ser Gin Asn Val 
35 40 45 

Lys Asp Gly Gly Arg Tyr Thr Phe Glu Asp Ala Gin Glu His lie Tyr 
50 55 60 



Lys Leu 
65 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
. (A) LENGTH: 66 amino acids 

(B) TYPEs amino acid 

(C) STRAND ED NESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Leu Ala Cys Glu Asp Phe Lys Lys Thr Glu Asp Lys Lys Gin Met Gin 
1 5 10 15 

Glu Lys Ala Lys Lys lie Tyr Met Thr Phe Leu Ser Asn Lys Ala Ser 
20 25 30 

Ser Gin Val Asn Val Glu Gly Gin Ser Arg Leu Thr Glu Lys lie Leu 
35 40 45 

Glu Glu Pro His Pro Leu Met Phe Gin Lys Leu Gin Asp Gin lie Phe 
50 55 60 

Asn Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Glu Ala Cys Glu Glu Leu Arg Phe Gly Gly Gin Ala Gin Val Pro Thr 
15 10 15 

Leu Val Asp Ser Val Tyr Gin Gin Phe Leu Ala Pro Gly Ala Ala Arg 
20 25 30 

Trp lie Asn He Asp Ser Arg Thr Met Glu Trp Thr Leu Glu Gly Leu 
35 40 45 

Arg Gin Pro His Arg Tyr Val Leu Asp Ala Ala Gin Leu His He Tyr 
50 55 60 
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Met Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 201 base paire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATCAGCTGTG AGGAGTACAA GAAAATCAAA TCACCTTCTA AACTAAGTCC CAAGGCCAAG 60 

AAGATCTACA ATGAGTTCAT CTCTCXGCAG GCAACAA&AG AGGTGAACCT GGATTCTTGC 120 

ACCAUAGAGG AGACAAGCCG GAACAxGixA GAGCCCACGA TaACCTGTxT TGATGAAGCC ISO 

CGGAAGAAGA TTTTCAACCT G 201 
(2) INFORMATION FOR SEQ ID NO: 16 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B ) TYPE : nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CAGCTTGTAA ATGTGCTCCT GAGCATCTTC GAATGTGTAT CGTCCTGGTT CCTTCACATT 60 

CTGTGTGGTC TTGTCATAAC TCTTCGAATC CAAGTTAATG GCACTGGGGG CCCCCGGAGC 120 

CAGAAATTCT TGCCATATTT CCTGTACTCG AGAGGGGACC TCTCGGATAG GCCTTTTCTT 180 

CAGGTCCTCC ACTGCCAA 198 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 201 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CTGGCCTGTG AGGAGTTCAA GAAGACCAGG TCGACTGCAA AGCTAGTCAC CAAGGCCCAC 60 
AGGATCTTTG AGGAGTTTGT GGATGTGCAG G CTCCACGGG AGGTGAATAT CGATTTCCAG 120 
ACCCGAGAGG CCACGAGGAA GAACATGCAG GAGCCGTCCC TGACTTGTTT TGATCAAGCC 180 
CAGGGAAAAG TCCACAGCCT C 201 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GAAGCCTGTG AGGATCTGAA GTATGGGGAT CAGTCCAAGG TCAAGGAGAA GGCAGAGGAG 60 
ATCTACAAGC TGTTCCTGGC ACCGGGTGCA AGGCGATGGA TCAACATAGA CGGCAAAACC 120 
ATGGACATCA CCGTGAAGGG GCTGAGACAC CCCCACCGCT ATGTGTTGGA CGCGGCGCAG 180 
ACCCACATTT ACATGCTC . • • 198 

(2) INFORMATION FOR SEQ ID NO: 19 i 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 201 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

CTGGCTTGTG AGGATTTCAA GAAGGTCAAA TCGCAGTCCA AGATGGCAGC CAAAGCCAAG 60 

AAGATCTTTG CTGAGTTCAT CGCGATCCAG GCTTGCAAGG AGGTAAACCT GGACTCGTAC 120 

ACACGAGAAC ACACTAAGGA GAACCTGCAG AGCATCACCC GAGGCTGCTT TGACCTGGCA 180 

CAAAAACGTA TCTTCGGGCT C 201 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 201 baBe pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GTTGCCTGTG AGAATTACAA GAAGATCAAG TCCCCCATCA AAATGGCAGA GAAGGCAAAG 60 
CAAATCTATG AAGAATTCAT CCAGACAGAG GCCCCTAAAG AGGTGAACAT TGACCACTTC 120 
ACTAAAGACA TCACCATGAA GAACCTGGTG GAACCTTCCC CTCACAGCTT TGACCTGGCC 180 
CAGAAAAGGA TCTACGCCCT G 201 
(2) INFORMATION FOR SEQ ID NO: 21: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE.: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CTGGCCGTCC AAGATCTCAA GAAGCAACCT CTACAGGATG TGGCCAAGAG GGTGGAGGAA 60 
ATCTGGCAAG AGTTCCTAGC TCCCGGAGCC CCAAGTGCAA TCAACCTGGA TTCTCACAGC 120 
TATGAGATAA CCAGTCAGAA TGTCAAAGAT GGAGGGAGAT ACACATTTGA AGATGCCCAG 180 
GAG CACATCT ACAAGCTG 198 
(2) INFORMATION FOR SEQ ID NO:22: ~ *-* - - 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 base pairs 

(B) TYPEs nucleic acid 
<C) STRAND EDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

CTAGCGTGTG AAGATTTCAA GAAAACGGAG GACAAGAAGC AGATGCAGGA AAAGGCCAAG 60 

AAGATCTACA TGACCTTCCT GTCCAATAAG GCCTCTTCAC AAGTCAATGT GGAGGGGCAG 120 

TCTCGGCTCA CTGAAAAGAT TCTGGAAGAA CCACACCCTC TGATGTTCCA AAAGCTCCAG 180 

GACCAGATCT TCAATCTC 198 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS * 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

GAGGCGTGTG AGGAGCTGCG CTTTGGCGGA CAGGCCCAGG TCCCCACCCT GGTGGACTCT 60 

GTTTACCAGC AGTTCCTGGC CCCTGGAGCT GCCCGCTGGA TCAACATTGA CAGCAGAACA 120 

ATGGAGTGGA CCCTGGAGGG GCTGCGCCAG CCACACCGCT ATGTCCTAGA TGCAGCACAA 180 

CTG CACATCT ACATGCTC 198 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 555 amino acids 
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(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 24: 

Met Ala Leu Pro Arg Leu Arg Val Asn Ala Ser Asn Glu Glu Arg Leu 
15 10 15 

Val His Pro Asn His Met Val Tyr Arg Lys Met Glu Met Leu Val Asn 
20 25 30 

Gin Met Leu Asp Ala Glu Ala Gly Val Pro lie Lys Thr Val Lys Ser 
35 40 45 

Phe Leu Ser Lys Val Pro Ser Val Phe Thr Gly Gin Asp Leu lie Gly 
50 55 60 

Trp lie Met Lys Asn Leu Glu Met Thr Asp Leu Ser Asp Ala- Leu H1b 
65 70 75 80 

Leu Ala His L*u Tie jO.» Sfir Hi.s oiv Tyr I^u. ?he. <51p. -lie ■ Asp -Aes> 
85 " 90 95 

His Val Leu Thr Val Lys Asn Asp Gly Thr Phe Tyr Arg Phe Gin Thr 
100 105 110 

Pro Tyr Phe Trp Pro Ser Asn Cys Trp Asp Pro Glu Asn Thr Asp Tyr 
115 120 125 

Ala Val Tyr Leu Cys Lys Arg Thr Met Gin Asn Lys Ala His Leu Glu 
130 135 140 

Leu Glu Asp Phe Glu Ala Glu Asn Leu Ala Lys Leu Gin Lys Met Phe 
145 150 15S 160 

Ser Arg Lys Trp Glu Phe Vai Phe Met Gin Ala Glu Ala Gin Tyr Lys 
165 170 175 

Val Asp Lys Lys Arg Asp Arg Gin Glu Arg Gin lie Leu Asp Ser Gin 
180 185 190 

Glu Arg Ala Phe Trp Asp Val His Arg Pro Val Pro Gly Cys Val Asn 
195 2O0 205 

Thr Thr Glu Val Asp Phe Arg Lys Leu Ser Arg Ser Gly Arg Pro Lys 
210 215 220 

Tyr Ser Ser Gly Gly His Ala Ala Leu Ala Ala Ser Thr Ser Gly lie 
225 230 235 240 

Gly Cys Thr Gin Tyr Ser Gin Ser Val Ala Ala Ala His Ala Ser Leu 
245 250 255 

Pro Ser Thr Ser Asn Gly Ser Ala Thr Ser Pro Arg Lys Asn Asp Gin 
260 265 270 

Glu Pro Ser Thr Ser Ser Gly Gly Glu Ser Pro Ser Thr Ser Ser Ala 
275 280 285 

Ala Ala Gly Thr Ala Thr Thr Ser Ala Pro Ser Thr Ser Thr Pro Pro 
290 295 300 

Val Thr Thr He Thr Ala Thr lie Asn Ala Gly Ser Phe Arg Asn Asn 
305 310 315 320 

Tyr Tyr Thr Arg Pro Gly Leu Arg Arg Cys Thr Gin val Gin Asp Thr 
325 330 335 
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Leu Lys Leu Glu He Val Gin Leu Asn Ser Arg Leu Ser Lys Asn Val 
340 345 350 

Leu Arg Thr Ser Lys Val Val Glu Asn Tyr Leu Ala Tyr Tyr Glu Gin 
355 360 365 

Arg Arg Val Phe Asp Pro Leu Leu Thr Pro Pro Gly Ser Gin Ala Asp 
370 375 380 

Pro Phe Gin Ser Gin Pro Asn Pro Trp lie Asn Asp Thr Val Asp Phe 
385 390 395 400 

Trp Gin His Asp Lys lie Thr Gly Asp He Gin Thr Arg Arg Leu Lys 
405 410 415 

Leu Trp Glu A3p Ser Phe Glu Glu Leu Leu Ala Asp Ser Leu Gly Arg 
420 425 430 

Glu Thr Leu Gin Lys Phe Leu Asp Lys Glu Tyr Ser Gly Glu Asn Leu 
435 440 445 

Arg Phe Trp Trp Glu Val Gin Lys Leu' Arg Lys Cys Ser 4>»/r Arg Met 
' 450' -*-—-- ' • • 4bb ' 4b0 - * " ~" "" *" 

Val Pro Val Met Val Thr Glu He Tyr Asn Glu Phe He Asp Thr Asn 
465 470 475 480 

Ala Ala Thr Ser Pro Val Asn Val Asp Cys Lys Val Met Glu Val Thr 
485 490 495 

Glu Asp Asn Leu Lys Asn Pro Asn Arg Trp Ser Phe Asp Glu Ala Ala 
500 505 510 

Asp H1b He Tyr Cys Leu Met Lys Asn Asp Ser Tyr Gin Arg Phe Leu 
515 520 525 

Arg Ser Glu He Tyr Lys Asp Leu Val Leu Gin Ser Arg Lys Lys Val 
530 535 540 

Ser Leu Asn Cys Ser Phe Ser He Phe Ala* Ser 
545 550 555 

(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 
(A J LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

( D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



( ix ) FEATURE : 

(A) NAME /KEY : Modif ied-site 

(D) OTHER INFORMATION; Xaa at position 1 is 1, L. 
E, or V, preferably L; Xaa at position 2 is A, S, or E, 
preferably A; Xaa at position 3 is C or V, preferably C; Xaa at 
position 5 is D, E, N, or K, preferably D; Xaa at position 6 is L, 

Y, or F; Xaa at position 7 is K or R, preferably R; and Xaa at 

position 8 is K, Y, R, or F, preferably K. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Xaa Xaa Xaa Glu Xaa xaa xaa xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH; 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

( ii) MOLECULE TYPE; protein 



(ix) FEATURE: 

(A) NAME/KEY; Modif ied-site 

(D) OTHER INFORMATION: Xaa at position 1 is F or 
L; preferably F; Xaa at position 2 is D, E, T, or Q, preferably 
D; Xaa at position 3 is E, D, T, Q, A, L, or K; Xaa at position 
4 is A or L, preferably A; Xaa at position 5 is Q or A, preferably 
Q; Xaa at position 6 is L, D, E, K, T, G, or H; Xaa at position 7 
is H, R, K, Q, or D; 

Xaa at position 8 is I or V, preferably I; Xaa at position 9 is Q, 
T, S, N, 
K, H, G, or A. 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Xaa Xaa Xaa Xaa Xaa Xaa xaa Xaa xaa Xaa Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3169 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 199.. 1864 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

TTTGAGACTT TTGTGGCTCA ACACCTCGTT TCTTTTGCAC CCGAACCGCA CCCACGGTAA 60 

CACGGATTCT GCGAGGAATG AAGGAGTAGA AGATAACGGG ACATTCCCTT GTGTCAAAGT 120 

GAGAGCCAAC GACGACGATC CTAAGAAGTA TAAACTTGGA AGAGTATTCA CAAAAGTCTT 180 

GAAGACTAAA GCTTCACA ATG GCT CTA CCA AGA TTG AGG GTA AAT GCA AGC 231 

Met Ala Leu Pro Arg Leu Arg Val Asn Ala Ser 
15 10 

AAC GAG GAG CGT CTT GTA CAT CCA AAC CAC ATG GTG TAC CGT AAG ATG 279 
Asn Glu Glu Arg Leu Val His Pro Asn His Met Val Tyr Arg Lys Met 
15 20 25 

GAG ATG CTT GTC AAT CAA ATG CTT GAT GCA GAA GCT GGT GTT CCA ATC 327 
Glu Met Leu Val Asn Gin Met Leu Asp Ala Glu Ala Gly Val Pro lie 
30 35 40 

AAG ACT GTC AAG AGT TTT CTG TCA AAA GTT CCA TCT GTA TTC ACC GGA 375 
Lys Thr Val Lys Ser Phe Leu Ser Lys Val Pro Ser Val Phe Thr Gly 
45 50 55 

CAA GAT CTG ATT GGA TGG ATC ATG AAA AAT CTT GAG ATG ACT GAT CTT 423 
Gin Asp Leu lie Gly Trp He Met Lys Asn Leu Glu Met Thr Asp Leu 
60 65 70 75 
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TCG GAT GCC CTT CAT CTG GCT CAT CTG ATC GCG TCA CAC GGT TAT CTT 471 
Ser Asp Ala Leu His Leu Ala Bis Leu lie Ala Ser Hie Gly Tyr Leu 
80 85 90 

TTC CAA ATT GAC GAT CAT GTG TTA ACG GTT AAA AAC GAT GGA ACA TTC 519 
Phe Gin lie Asp Asp His Val Leu Thr Val Lys Asn Asp Gly Thr Phe 
95 100 105 

TAT CGG TTT CAA ACT CCA TAC TTT TGG CCG TCA AAT TGT TGG GAT CCG 567 
Tyr Arg Phe Gin Thr Pro Tyr Phe Trp Pro Ser Asn Cys Trp Asp Pro 
110 115 120 

GAA AAT ACT GAT TAC GCG GTG TAC CTG TGC AAG CGG ACA ATG GAG AAC 615 
Glu Asn Thr Asp Tyr Ala Val Tyr Leu Cys Lys Arg Thr Het Gin Asn 
125 130 135 

AAA GCG CAT TTG GAA CTG GAG GAC TTT GAA GCG GAG AAC CTG GCA AAG 663 
Lys Ala His Leu Glu Leu Glu Asp Phe Glu Ala Glu Asn Leu Ala Lys 
140 145 150 155 . 

CTG CAG AAG ATG TTC TCG CGC- AAG TGG GAA TTT GTG TTC ATG CAA GCC 711 
Leu Gin Lys H&l Phe Ser Arg Lye Trp Glu Phe Val' Phe Met GLxi-Rxa 
J60- • - ■ " - " 165 * 1 170 ' " 

GAA GCT CAA TAC AAG GTC GAC AAG AAG CGA GAT CGC CAG GAG CGC CAA 759 
Glu Ala Gin Tyr Lys Val Asp Lys Lys Arg Asp Arg Gin Glu Arg Gin 
175 180 185 

ATT CTT GAC AGT CAG GAA CGT GCT TTC TGG GAT GTT CAT CGT CCA GTG 807 
lie Leu Asp Ser Gin Glu Arg Ala Phe Trp Asp Val His Arg Pro Val 
190 195 200 

CCA GGA TGT GTA AAC ACT ACA GAA GTC GAC TTC CGG AAG CTT TCA CGG 855 
Pro Gly Cys Val Asn Thr Thr Glu Val Asp Phe Arg Lys Leu Ser Arg 
205 210 215 

TCT GGA AGG CCC AAG TAC AGT AGT GGA GGA CAC GCA GCA TTG GCC GCT 903 
Ser Gly Arg Pro Lys Tyr Ser Ser Gly Gly His Ala Ala Leu Ala Ala 
220 225 230 235 

TCA ACG TCG GGT ATC GGT TGC ACT CAG TAT TCA CAA AGT GTG GCA GCA 951 
Ser Thr Ser Gly lie Gly Cys Thr Gin Tyr Ser Gin Ser Val Ala Ala 
240 245 250 

GCT CAT GCG AGT CTT CCA TCA ACA TCA AAT GGG AGT GCA ACA TCT CCA 999 
Ala His Ala Ser Leu Pro Ser Thr Ser Asn Gly Ser Ala Thr Ser Pro 
255 260 265 

AGA AAG AAC GAT CAG GAG CCA TCA ACA TCA AGT GGG GGT GAA TCT CCA 1047 
Arg Lys Asn Asp Gin Glu Pro Ser Thr Ser Ser Gly Gly Glu Ser Pro 
270 275 280 

TCA ACA TCG TCT GCT GCT GCT GGA ACT GCC ACA ACA TCT GCA CCA TCA 1095 
Ser Thr Ser Ser Ala Ala Ala Gly Thr Ala Thr Thr Ser Ala Pro Ser 
285 290 295 

ACA TCA ACG CCT CCG GTG ACA ACT ATT ACT GCA ACG ATA AAT GCA GGA 1143 
Thr Ser Thr Pro Pro Val Thr Thr lie Thr Ala Thr lie Asn Ala Gly 
300 305 310 315 

TCA TTC CGA AAT AAC TAT TAC ACA AGA CCT GGA TTA CGG CGG TGT ACA 1191 
Ser Phe Arg Asn Asn Tyr Tyr Thr Arg Pro Gly Leu Arg Arg Cys Thr 
320 325 330 

CAA GTA CAG GAT ACG TTA AAA CTG GAA ATT GTG CAA TTG AAT AGT CGA 1239 
Gin Val Gin Asp Thr Leu Lys Leu Glu He Val Gin Leu Asn Ser Arg 
335 340 345 

TTA TCA AAA AAT GTA TTA CGT ACA TCT AAA GTT GTA GAA AAT TAT TTG 1287 
Leu Ser Lys Asn Val Leu Arg Thr Ser Lys Val Val Glu Asn Tyr Leu 
350 355 360 
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GCA TAT TAC GAA CAA CGT CGA GTA TTT GAT CCA CTG TTA ACG CCT CCT 1335 
Ala Tyr Tyr Glu Gin Arg Arg Val Phe Asp Pro Leu Leu Thr Pro Pro 
365 370 375 

GGA TCT CAG GCT GAT CCT TTT CAA TCA CAG CCT AAT CCA TGG ATT AAC 1383 
Gly Ser Gin Ala Asp Pro Phe Gin Ser Gin Pro Asn Pro Trp lie Asn 
380 385 390 395 

GAT ACT GTT GAT TTT TGG CAA CAT GAT AAA ATT ACG GGA GAC ATC CAA 1431 
Asp Thr Val Asp Phe Trp Gin His Asp Lys He Thr Gly Asp lie Gin 
400 405 410 

ACC CGC CGA CTC AAG CTT TGG GAG GAT AGT TTT GAA GAA TTA CTT GCT 1479 
Thr Arg Arg Leu Lys Leu Trp Glu Asp Ser Phe Glu Glu Leu Leu Ala 
415 420 425 

GAT TCA TTA GGT CGA GAA ACT CTT CAA AAA TTC CTT GAC AAA GAA TAT 1527 
Asp Ser Leu Gly Arg Glu Thr Leu Gin Lys Phe Leu Asp Lys Glu Tyr 
430 435 440 

TCT GGA GAA AAC TTG CGG TTT TGG TGG GAG GTA CAA AAG CTG CGA AAG 1575 
Ser Gly Glu Asn Leu Arg Phe . Trp Trp Glu Val Gin Lys Leu Arg Lys 

- , v - .445 : , . . 45jQ. . - - ..-455 - - ... ■ - . ■ . - 

TGC AGT TCA AGA ATG GTT CCA GTT ATG GTA ACA GAG ATT TAC AAC GAG 1623 
Cys Ser Ser Arg Met Val Pro Val Met Val Thr Glu He Tyr Asn Glu 
460 465 470 475 

TTT ATC GAT ACA AAT GCG GCA ACG TCG CCG GTC AAT GTG GAT TGT AAA 1671 
Phe He Asp Thr Asn Ala Ala Thr Ser Pro Val Asn Val Asp Cys Lys 
480 4B5 490 

GTG ATG GAA GTG ACC GAA GAC AAT TTA AAG AAT CCA AAT CGG TGG AGT 1719 
Val Met Glu Val Thr Glu Asp Asn Leu Lys Asn Pro Asn Arg Trp Ser 
495 500 505 

TTT GAT GAA GCA GCG GAT CAT ATC TAC TGC CTT ATG AAG AAC GAT AGT 1767 
Phe Asp Glu Ala Ala Asp His He Tyr Cys Leu Met Lys Asn Asp Ser 
510 515 520 

TAT CAA CGC TTT CTT CGT TCA GAA ATT TAT AAG GAT TTA GTA TTA CAA 1815 
Tyr Gin Arg Phe Leu Arg Ser Glu He Tyr Lys Asp Leu Val Leu Gin 
525 530 535 

TCA AGA AAG AAG GTA AGT CTC AAT TGC TCG TTT TCC ATT TTT GCA TCT T 1864 
Ser Arg Lys Lys Val Ser Leu Asn Cys Ser Phe Ser He Phe Ala Ser 
540 545 550 555 

GATTCCTCTG AAACCCCTTT CAGTTCCGGT TTTAGCTTAG TTTGATTCCC ACCTTTTTTC 1924 

CCTTCCCTTC CCCCATGAAT GTTTTCTTTT CACACTATGA GATATGTGTT TCATCTATTT 1984 

TTCCGATTGA AAG CTT A CTG AATGCTCGCT GAAAAACTTC AAATAACAAA CTCAGACCAA 2044 

ATAACATCAA AGTTCGAGCA ATTTATTTTT TTTATACCAA AAGCATGTTC AATTGAATAT 2104 

CCCATTCAGT CACTAACACT CTGATTTCAT TCAGTTAATT ATATTTTTAC AAGTAGGATC 2164 

AATACACCTC AATCCCAATC AATCTAACAC ATGTTCATCC CGATCTCACT AAAATTTCAA 2224 

CATTTAATAT TTCCAATCCA AAACCTAAAA CGTTAAACAT TTGATCTTGT TTCAAATTCA 2284 

AAATTTTCTA ACATTGATTC AGACAACGTT TACCTCACTG ATTGCTCGTA AAGCATCGCG 2344 

ACGCAT CGGA TCGACAATGT CGCGGAGCTC GCAGAGCAAC AAAACTCTGC ATGCGAGCGC 2404 

CTCTCTCGGC TCGGCGCTTT CCGGTCACGG CTCTTCCACA TCATCAATGC TCACCGCCGG 2464 

AGGAGCGGCG TCGAGCCAGA ATCTGCTGCT CGCCCCGCCA CAACATCATC TGTATGTGCC 2524 

CTCACTCTCT CTCTCATACA CTCACACTCA ACACTCACTC CCAATGAAAT GCAGAATGAA 2584 
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TGTAGTCTTT 


TGACAGAAAT 


TGTGGAGAAT 


AGGGATGAGG 


AAAAATGAGG 


AAAGATATAA 


2644 


GTTTAAAACT 


TGAAAAACGT 


TCCAAAAATT 


GAAACCAATA 


TTCATTTCTT 


TCAATATCTC 


2704 


TGATCTTTCC 


AACAAGTCCG 


GTTCATTCCA 


CAGACTTTGC 


AAAATCTCTG 


TAAAATTTTC 


2764 


CTACTTTTTC 


TTGACGCAAC 


TATGTTCATT 


CATGTCATTT 


GACTTCTCCT 


CTCATTGTCC 


2824 


AAAATCTTGT 


CACTGGTTAC 


ATTGGTCACG 


TCCACAGCGT 


CACACATCTT 


GCAATAATCA 


2884 


CTAATCACTT 


TTTGTCCTGT 


CACTCTCCAG 


TCTGCTCTTT 


C ACTG AG TTT 


CACTGAAATT 


2944 


TTCGAAAGCA 


TGTCACTTGA 


TTTTTTCGGT 


TTGCTGCTCA 


CATTGCACGG 


CCCTTTGAAT 


3004 


GCACCTGTTG 


ACTTTGGTTT 


CTGGAAAATA 


CTGAAAATGT 


GTTTTGTGTG 


AATTTGTAAA 


3064 


TCTGAAATTG 


CAATGATTTT 


GGATGATTTC 


ATCTTTGAGA 


CTGTTTGCTC 


TGCTATTGTC 


3124 


TTCTCTGAAC. 


TACTCGAAAA 


TTTGAATTGA 


AAAAAAAAAA 


AAAAA 




3169 



- " {£.)■■ INFORMATION FOU SEQ-ID UG-.-23 : — - - 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Phe Glu Met Ala Gin Thr Ser Val Phe Lys Leu Met Ser Ser Asp Ser 
15 10 15 

Val Pro Lys Phe Leu Arg Asp Pro Lys Tyr Ser Ala lie 
20 25 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Phe Glu lie Val Ser Asn Glu Met Tyr Arg Leu Met Asn Asn Asp Ser 
1 5 10 15 

Phe Gin Lys Phe Thr Gin Ser Asp Val Tyr Lys Asp Ala 
20 25 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

{C) STRANDEDNESS: not relevant 
(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Ser Trp Gin Asp Ser Phe Asp Thr Leu Met Ser Phe Lys Ser Gly Gin 
15 10 15 

Lys Cys Phe Ala Glu Phe Leu Lys Ser Glu Tyr Ser Asp Glu Asn lie 
20 25 30 

Leu Phe Trp Gin Ala Cys Glu Glu Leu Lys Arg Glu Lys Asn Ser Lys 
35 40 45 

Met Glu Glu Lys Ala Arg lie lie Tyr Glu Asp Phe He Ser He Leu 
50 55 60 

Ser Pro Lys Glu Val Ser Leu Asp Ser Lys Val Arg Glu lie Val Asn 
65 70 75 80 

Thr Asn Met Arg Pro Thr Gin Acn Thr Phs Clu £la- Gin Hie 

85 90 95 

Gin He Tyr Gin Leu Met Ala Arg Asp Ser Tyr Pro Arg Phe Leu Thr 
100 105 110 

Ser He Phe Tyr Arg Glu Thr 
115 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Gin Trp Ser Gin Ser Leu Glu Lys Leu Leu Ala Asn Gin Thr Gly Gin 
1 5 10 15 

Asn Val Phe Gly Ser Phe Leu Lys Ser Glu Phe Ser Glu Glu Asn He 
20 25 30 

Glu Phe Trp Leu Ala Cys Glu Asp Tyr Lys Lys Thr Glu Ser Asp Leu 
35 40 45 

Leu Pro Cys Lys Ala Glu Glu He Tyr Lys Ala Phe Val His Ser Asp 
50 55 60 

Ala Ala Lys Gin He Asn He Asp Phe Arg Thr Arg Glu Ser Thr Ala 
65 70 75 SO 

Lys Lys He Lys Ala Pro Thr Pro Thr Cys Phe Asp Glu Ala Gin Lys 
85 90 95 

Val He Tyr Thr Leu Met Glu Lys Asp Ser Tyr Pro Arg Phe Leu Lys 
100 105 HO 

Ser Asp He Tyr Leu Asn Leu 
115 

(2) INFORMATION FOR SEQ ID NO: 32: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 121 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Leu Trp Ser Glu Ala Phe Asp Glu Leu Leu Ala Ser Lys Tyr Gly Leu 
15 10 15 

Ala Ala Phe Arg Ala Phe Leu Lys Ser Glu Phe Cys Glu Glu Asn lie 
20 25 30 

Glu Phe Trp Leu Ala Cys Glu Asp Phe Lys Lys Thr Lys Ser Pro Gin 
35 40 45 

Ly c -Leu Ssr -Ser -Lys Ala - Arg Lys lie Tyr - Thr Asp - Phe I lo Glu Lys - 
50 55 60 

Glu Ala Pro Lys Glu lie Asn lie Asp Phe Gin Thr Lys Thr Leu lie 
65 70 75 80 

Ala Ala Gin Asn lie Gin Glu Ala Thr Ser Gly Cys Phe Thr Thr Ala 
85 90 95 

Gin Lys Arg Val Tyr Ser Leu Met Glu Asn Asn Ser Tyr Pro Arg Phe 
100 105 110 

Leu Glu Ser Glu Phe Tyr Gin Asp Leu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



{ ix ) FEATURE : 

(A) NAME /KEY : Modif ied-site 

(D) OTHER INFORMATION: /note- "Xaa at position 6 is L, Y, 

or F. - 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Leu Ala Cys Glu Asp Xaa Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



( ix ) FEATURE : 
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(A) NAME /KEY: Modified- site 

(D) OTHER INFORMATION: /note= "Xaa at position 3 is E, D, 
T, -Q, A, L, or K; Xaa at position 6 is L, D, E, K, T, G # or H; 
and Xaa at position 7 is H, R, K, Q, or D." 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Phe Asp Xaa Ala Gin Xaa Xaa He Xaa 
1 5 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTHS 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GTGCTAGCAC TGCA 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Ser Asn Asn Ala Arg Leu Asn His He Leu Gin Asp Pro Ala Leu.Lys 
15 10 15 

Leu Leu Phe Arg Glu Phe Leu Arg Phe Ser Leu Cys Glu Glu Asn Leu 
20 25 30 

Ser Phe Tyr He Asp Val Ser Glu Phe Thr Thr 
35 40 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

Ser Asn Leu Asn Lys Leu Asp Tyr Val Leu Thr Asp Pro Gly Met Arg 
15 10 15 
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Tyr Leu Phe Arg Arg His Leu Glu Lys Phe Leu Cys Val Glu Asn Leu 
20 25 30 

Asp Val Phe lie Glu lie Lys Arg Phe Leu Lys 
35 40 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 118 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Sei 'ixp Aia Ala Gly Asn Cys Ala Asn Val Leu Asn Asp ; ASp Lys Gly 

•"1 "'" - 5" ; ' ~ " • ' 10 ■ ' 15 

Lys Gin Leu Phe Arg Val Phe Leu Phe Gin Ser Leu Ala Glu Glu Asn 
20 25 30 

Leu Ala Phe Leu Glu Ala Met Glu Lys Leu Lys Lys Met Lys lie Ser 
35 40 45 

Asp Glu Lys Val Ala Tyr Ala Lya Glu lie Leu Glu Thr Tyr Gin Gly 
50 55 60 

Ser lie Asn Leu Ser Ser Ser Ser Met Lys Ser Leu Arg Asn Ala Val 
65 70 75 80 

Ala Ser Glu Thr Leu Asp Met Glu Glu Phe Ala Pro Ala He Lys Glu 
85 90 95 

Val Arg Arg Leu Leu Glu Asn Asp Gin Phe Pro Arg Phe Arg Arg Ser 
100 105 110 

Glu Leu Tyr Leu Glu Tyr 
115 

(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 123 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Lys Trp Ala Gin Ser Phe Glu Gly Leu Leu Gly Asn His Val Gly Arg 
15 10 15 

His His Phe Arg lie Phe Leu Arg Ser He His Ala Glu Glu Asn Leu 
20 25 30 

Arg Phe Trp Glu Ala Val Val Glu Phe Arg Ser Ser Axg His Lys Ala 
35 40 45 

Asn Ala Met Asn Asn Leu Gly Lys Val He Leu Ser Thr Tyr Leu Ala 
50 55 60 
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Glu Gly Thr Thr Aen Glu Val Phe Leu Pro Phe Gly Val Arg Gin Val 
65 70 75 80 

lie Glu Arg Arg lie Gin Asp Asn Gin lie ABp lie Thr Leu Phe Asp 
85 90 95 

Glu Ala lie Lys His Val Glu Gin Val Leu Arg Asn Asp Pro Tyr Val 
100 105 110 

Arg Phe Leu Gin Ser Ser Gin Tyr lie Asp Leu 
115 120 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 420 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Leu Ser Lys lie Pro Ser Val Phe Ser Gly Ser Asp lie Val Gin Trp 
15 10 15 

Leu lie Lys Asn Leu Thr lie Glu Asp Pro Val Glu Ala Leu His Leu 
20 25 30 

Gly Thr Leu Met Ala Ala His Gly Tyr Phe Phe Pro He Ser Asp His 
35 40 45 

Val Leu Thr Leu Lys Asp Asp Gly Thr Phe Tyr Arg Phe Gin Thr Pro 
50 55 60 

Tyr Phe Trp Pro Ser Asn Cys Trp Glu Pro Glu Asn Thr Asp Tyr Ala 
65 70 75 80 

Val Tvr Leu Cys Lys Arg Thr Met Gin Asn Lys Ala Arg Leu Glu Leu 
85 90 95 

Ala Asp Tyr Glu Ala Glu Ser Leu Ala Arg Leu Gin Arg Ala Phe Ala 
100 105 110 

Arg Lys Trp Glu Phe He Phe Met Gin Ala Glu Ala Gin Ala Lys Val 
115 120 125 

Asp Lys Lys Arg Asp Lys He Glu Arg Lys lie Leu Asp Ser Gin Glu 
130 135 140 

Arg Ala Phe Trp Asp Val His Arg Pro Val Pro Gly Cys Val Asn Thr 
145 150 155 160 

Thr Glu Val Asp He Lys Lys Ser Ser Arg Met Arg Asn Pro His Lys 
165 170 175 

Thr Ara Lys Ser Val Tyr Gly Leu Gin Asn Asp He Arg Ser His Ser 
180 185 190 

Pro Thr His Thr Pro Thr Pro Glu Thr Lys Pro Pro Thr Glu Asp Glu 
195 200 205 

Leu Gin Gin Gin He Lys Tyr Trp Gin He Gin Leu Asp Arg His Arg 
210 215 220 

Leu Lys Met Ser Lys Val Ala Asp Ser Leu Leu Ser Tyr Thr Glu Gin 
225 230 235 240 
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Tyr Leu Glu Tyr Asp Pro 
245 



Trp Leu ser Asp Asp Thr 
260 

Pro Ser Gin Gin Arg Val 
275 



Leu Lys Asp Pro Val Gly 
290 



Glu Phe Ser ser Glu Asn 
305 310 



Lys Lys Arg Pro lie Lys 
325 

Gin Glu Phe Leu Ala Pro 
340 

Lys Ser Tyr Asp Lys Thr 
' 355" 



Thr Phe Glu Asp Ala Gin 
370 



Ser Tyr Pro Arg Phe lie 
385 390 



Ala Lys Lys Lys Gly Lys 
405 

Ala Gin Ser Tyr 
420 



(2) INFORMATION FOR SEQ ID NO: 41s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1913 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

TCTTTCCAAG ATACCTAGCG TCTTCTCTGG TTCAGACATT GTTCAATGGT TGATAAAGAA 60 

CTTAACTATA GAAGATCCAG TGGAGGCGCT CCATTTGGGA ACATTAATGG CTGCCCACGG 120 

CTACTTCTTT CCAATCTCAG ATCATGTCCT CACACTCAAG GATGATGGCA CCTTTTACCG 180 

GTTTCAAACC CCCTATTTTT GG C CAT C AAA TTG TTGGGAG CCGGAAAACA CAGATTATGC 240 

CGTTTACCTC TGCAAGAGAA CAATGCAAAA CAAGGCACGA CTGGAGCTCG CAGACTATGA 300 

GGCTGAGAGC CTGGCCAGGC TGCAGAGAGC ATTTG CCCGG AAGTGGGAGT TCATTTTCAT 360 

GCAAGCAGAA GCACAAGCAA AAGTGGACAA GAAGAGAGAC AAGATTGAAA GGAAGATCCT 420 

TGACAGCCAA GAGAGAGCGT TCTGGGACGT GCACAGGCCC GTGCCTGGAT GTGTAAATAC 480 

AACTGAAGTG GACATTAAGA AGTCATCCAG AATGAGAAAC CCCCACAAAA CACGGAAGTC 540 

TGTCTATGGT TTACAAAATG ATATTAGAAG TCACAGTCCT ACCCACACAC CCACACCAGA 600 



Phe Leu Leu Pro Pro Asp Pro Ser Asn Pro 
250 255 

Thr Phe Trp Glu Leu Glu Ala Ser Lys Glu 
265 270 

Lys Arg Trp Gly Phe Gly Met Asp Glu Ala 
280 285 

Arg Glu Gin Phe Leu Lys Phe Leu Glu Ser 
295 300 

Leu Arg Phe Trp Leu Ala Val Glu Asp Leu 

315 320 

Glu Val Pro Ser Arg Val Gin Glu He Trp 
330 335 

Gly Ala Pro Ser Ala He Asn Leu Asp Ser 
345 350 

Thr Gin Asn Val Lys Glu Pro J Gly Arg Tyr 
360 ' ' ~ 3'65^ " • 

Glu His He Tyr Lys Leu Met Lys Ser Asp 
375 380 

Arg Ser Ser Ala Tyr Gin Glu Leu Leu Gin 
395 400 

Ser Leu Thr Ser Lys Arg Leu Thr Ser Leu 
410 415 
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AACTAAACCT 


CCAACAGAAG 


ATGAGTTACA 


AGAACAGATA 


AAATATTGGC 


AAATACAGTT 


660 


AGATAGACAT 


CGGTTAAAAA 


TGTCAAAAGT 


CGCTGACAGT 


CTACTAAGTT 


ACACGGAACA 


720 


GTATTTAGAA 


TACGACCCGT 


TTCTTTTGCC 


ACCTGACCCT 


TCTAACCCAT 


GGCTGTCCGA 


780 


TGACACCACT 


TTCTGGGAAC 


TTGAGGCAAG 


CAAAGAACCG 


AG CCAGCAGA 


GGGTAAAACG 


840 


ATGGGGTTTT 


GGCATGGACG 


AGGCATTGAA 


AGACCCAGTT 


GGGAGAGAAC 


AGTTCCTTAA 


900 


ATTTCTAGAG 


TCAGAATTCA 


G CTCGG AAAA 


TTTAAGATTC 


TGGCTGGCAG 


TGGAGGACCT 


960 


GAAAAAGAGG 


CCTATTAAAG 


AAGTACCCTC 


AAGAGTTCAG 


GAAATATGGC 


AAGAGTTTCT 


1020 


GGCTCCCGGA 


GCCCCCAGTG 


CTATTAACTT 


GGATTCCAAG 


AGTTATGACA 


AAACCACACA 


1080 


GAACGTGAAG 


GAACCTGGAC 


GATACACATT 


TGAAGATGCT 


CAGGAGCACA 


TTTACAAACT 


1140 


GATGAAAAGT 


GATTCATACC 


CACGTTTTAT 


AAGATCCAGT 


G CCT ATCAGG 


AGCTTCTACA 


1200 


GGCAAAGAAA 


AAGGGGAAAT 


CTCTCACGTC 


CAAGAGGTTA 


ACAAGCCTTG 


CTCAGTCTTA 


1260 


CTAAACGGAT 


CATCTTGTAG' ""CATGAATGCA 


GAC2GGAUTC 


ACTGUACAGA 


CTTTGTAGCT ' 


1320 


CAATGTTGTG 


ACCTGGAGCA 


GAGGACATTA 


GAACAAGATG 


TTGCATGAGC 


AAAGGACCTA 


1380 


AATTGTTATT 


TTTGTGTGTA 


CATTCCATCT 


CCAATGGACT 


CTTCCGTCTC 


AATGCCTCCA 


1440 


TT CCAAACTG 


TTGTCTGCTT 


TCTTTCTCCT 


TCTACTATGC 


TGGATCTGTG 


TCTCTTCCTT 


1500 


TTTAACAAGT 


TCAAGTGAAG 


TAAAACCTTT 


TCl'TTTTTTC 


CTTCTTTCTC 


TCTCTCTCTC 


1560 


TCTCAAAGCT 


TCAGTTAGAC 


ACACAGTTCA 


CTGAAAATTC 


AGTCAGTCAA 


AAACTGGAAG 


1620 


AACTGTAAAA 


GAAAAAAGTA 


TATATCAATA 


AGTATACATG 


TGGCTTCACA 


TTTATTAAAC 


1680 


AATAAATTCC 


GCACAGAAAG 


TTTCATTTCA 


CCAATGTGTC 


ACAGTCAGAA 


ACAAACTCAT 


1740 


GTCTTCGTCT 


GTTGTCTGTA 


CATTCTCCGT 


TAATGTTTCT 


CGCATTTATT 


TTTATACCAT 


1800 


ATTTAAAGAA 


GAAACACCTT 


TTACTCCAAA 


TGTATTAAAG 


TTGATCCCTT 


CTCTGTAAAT 


1860 


TTGTGTATGT 


TTATATTGTT 


GTTTTATCTT 


TCATTGAAAG 


ATGCAGAATC 


TCC 


1913 
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Claims 

1. Substantially pure nucleic acid encoding an 
RGS polypeptide. 

2. The nucleic acid of claim 1, wherein said 
5 nucleic acid encodes the egl-10 gene. 

3. The nucleic acid of claim 1, wherein said 
nucleic acid encnc?>s .the human rgs2 gene, ... 

4. The nucleic acid of claim 1, wherein said 
nucleic acid is genomic DNA. 

10 5. The nucleic acid of claim 1, wherein said 

nucleic acid is cDNA. 

6. Substantially pure DNA having the sequence of 
Fig, 2A, or degenerate variants thereof said DNA encoding 
the amino acid sequence of the open reading frame of Fig. 

15 2. 

7. A DNA sequence substantially identical to the 
DNA sequence shown in Figure 2A. 

8. Substantially pure DNA having about 50% or 
greater sequence identity to the DNA sequence of Fig. 2A. 

20 9. A DNA sequence substantially identical to a 

nucleotide sequence in Fig. 1 (SEQ ID NO: 41) . 
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10. Substantially pure DNA having the sequence of 
Fig. 3C (SEQ ID NO: 40) , or degenerate variants thereof, 
said DNA encoding the amino acid sequence of the open 
reading frame of Fig. 3C (SEQ ID NO:40). 

5 11. Substantially pure DNA encoding a polypeptide 

having about 30% or greater sequence identity to the 
polypeptide encoded by the DNA sequence of Fig. 7 (SEQ ID 
NO:41) . 

12. The nucleic acid of claim 1, wherein said 
10 nucleic acid is operably linked to regulatory sequences 

for expression of said polypeptide, and 

wherein said regulatory sequences comprise a 
promoter. 

13. The DNA of claim 12, wherein said promoter is 
15 a constitutive promoter inducible by one or more external 

agents, or is cell-type specific. 

14. A vector comprising the DNA of claim 1, said 
vector being capable of directing expression of the 
peptide encoded by said DNA in a vector-containing cell. 

20 15. A substantially pure oligonucleotide 

comprising the sequence: 

5 ' GNIGANAARYTIGANTTRTGG 3 ' f wherein N is G or A; 
R is T or C; and Y is A, T, or C (SEQ ID NO: 2) . 
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16. A substantially pure oligonucleotide 
comprising the sequence: 

5' GNIGANAARYTISGITTRTGG 3', wherein N is G or A; 
R is T or C; Y is A r T, or C; and S is A or C (SEQ ID NO: 
5 3). 

17. A substantially pure oligonucleotide 
comprising the sequence: 

5' GNTAIGANTRITTRTRCAT 3', wherein N is G or A; 
and R is T or c .(s.EO ID" NO: .4)". . .. 

10 18. A substantially pure oligonucleotide 

comprising the sequence: 

5' GNTAN CTNTRI TTRTRCAT 3', wherein N is G or A; 
and R is T or C (SEQ ID NO: 5) . 

19 . A recombinant gene comprising a combination 
15 of any two or more sequences of claims 15, 16, 17, and 

18. 

20. A cell which contains the nucleic acid of 
claim 1. 

21. The cell of claim 20, said cell being 

2 0 selected from the group consisting of a bacterial cell, a 
yeast cell, and a mammalian cell. 

22. The cell of claim 21, wherein said cell 
further contains an rgs gene operably link d to 
regulatory DNA comprising a promoter. 
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23. The cell of claim 22, wherein said promoter 
is selected from the group consisting of a constitutive 
promoter, an inducible promoter, and a cell-type specific 
promoter . 

5 24 . A transgenic animal which contains the 
nucleic acid of claim 1 integrated into the genome of 
said animal, wherein said nucleic acid is DNA, and said 
DNA is expressed in the somatic cells and the germ cells 
of said transgenic animal. ^ , . 

10 25. A cell from a transgenic animal of claim 24. 

26. A method of controlling a heterotrimeric G- 
protein mediated event in a cell, said method comprising 
introducing into said cell the nucleic acid of claim 1 in 
a manner effective to alter said G-protein mediated 

15 events. 

27. The claim 26, wherein said event is method of 
G-protein signalling. 

28. The method of claim 26, wherein said nucleic 
acid is selected from the group consisting of nucleic 

20 acid encoding an RGS, BL34/IR20, GOS8 # and C05B.7 

polypeptides, said nucleic acid positioned for expression 
in said cell. 
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29. A method of regulating G-protein signalling 
in a cell, said method comprising providing to said cell 
an effective amount of an RGS polypeptide. 



30. The method of claim 29, wherein said 

5 polypeptide is selected from the group consisting of an 
RGS, BL34/IR20, GOS8 , and C05B.7 polypeptides. 

31. A method of detecting an rgs gene in a cell, 

. .^ . said method, cpmpris ingx. _ LJ^ . . - , 

contacting the DNA of claim 1 or a portion thereof 

10 greater than 18 nucleic acids in length with a 
preparation of genomic DNA from said cell under 
hybridization conditions providing detection of DNA 
sequences having 50% or greater sequence identity to the 
sequence of any one of the sequences of SEQ ID NOS: 2 

15 through 5. 



32. A method of producing an RGS polypeptide 
comprising: 

providing a cell transformed with DNA encoding an 
RGS polypeptide positioned for expression in said cell; 
2 0 culturing said transformed cell under conditions 

for expressing said DNA; and 

isolating said RGS polypeptide. 



33. A method of isolating a rgs gene or portion 
thereof from a cell, said rgs gene having sequence 
25 identity to the RGS conserved region, said method 
comprising: 
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amplifying by PCR said rgs gene or a portion 
thereof using oligonucleotide primers wherein said 
primers 

(a) are each greater than 13 nucleotides in 

5 length; 

(b) each have regions of complementarity to 
opposite DNA strands in a region of the nucleotide 
sequence of SEQ ID NO: 1; and 

(c) contain sequences capable of producing 
10 restriction enzyme cut' sites in the amplified product; 

and 

isolating said rgs gene or portion thereof. 

34. A method of isolating a rgs gene or fragment 
thereof from a cell, comprising: 
15 (a) providing a sample of DNA from said cell; 

(b) providing a pair of oligonucleotides 
having sequence identity to a conserved region of an rgs 
gene; 

(c) combining said pair of oligonucleotides 
20 with said DNA sample under conditions suitable for 

polymerase chain reaction-mediated DNA amplification; and 

(d) isolating said amplified rgs gene or 
fragment thereof. 

35. The method of claim 34, wherein said 
2 5 amplification is carried out using a reverse- 
transcription polymerase chain reaction. 
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36. The method of claim 34, wherein said reverse- 
transcription polymerase chain reaction is RACE. 

37. A method of identifying an rgs gene in a 
cell, comprising: 

5 (a) providing a preparation of DNA from said cell; 

(b) providing a detectably-labelled DNA sequence 
having at least 50% identity to a conserved region of an 
rgs gene; 

(-) contacting - said preparation of -DNA "with said 
10 detectably-labelled DNA sequence under hybridization 
conditions providing detection of genes having 50% or 
greater sequence identity; and 

(d) identifying an rgs gene by its association 
with said detectable label. 

15 38. The method of claim 37, wherein said DNA 

sequence is produced according to the method of claim 45. 

39. The method of claim 37, wherein said 
preparation of DNA is isolated from a human genome. 

40. A method of isolating an rgs gene from a 
20 recombinant DNA library, comprising: 

(a) providing a recombinant DNA library; 

(b) contacting said recombinant DNA library with a 
detectably-labelled gene fragment produced according to 
the method of claim 4 5 under hybridization conditions 

2 5 providing detection of g nes having 50% or greater 
sequence identity; and 
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(c) isolating a member of an rgs gene by its 
association with said detectable label. 



41. A method of isolating an rgs gene from a 
recombinant DNA library, comprising: 
5 (a) providing a recombinant DNA library; 

(b) contacting said recombinant DNA library with a 
detectably-labelled oligonucleotide of any of claims 15- 
-19 under hybridization conditions providing detection of 
genes having 50% or greater sequence identity; and 
10 (c) isolating an rgs gene by its association with 

said detectable label. 



42. An rgs gene isolated according to the method 
comprising: 

(a) providing a sample of DNA; 
15 (b) providing a pair of oligonucleotides having 

sequence homology to a conserved region of an rgs gene; 

(c) combining said pair of oligonucleotides with 
said DNA sample under conditions suitable for polymerase 
chain reaction-mediated DNA amplification; and 
2 0 (d) isolating said amplified rgs gene or fragment 

thereof . 



43. An rgrs gene isolated according to the method 
comprising: 

(a) providing a preparation of DNA; 
25 (b) providing a detectably-labelled DNA sequence 

having homology to a conserved region of an rgs gene; 
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(c) contacting said preparation of DNA with said 
detectably-labelled DNA sequence under hybridization 
conditions providing detection of genes having 50% or 
greater sequence identity; and 
5 (d) identifying an rgs gene by its association 

with said detectable label. 

44. An rgs gene isolated according to the method 
comprising: 

(a) providing a recombinant DNA library; 

10 (b) contacting said recombinant DNA library with a 

detectably- label led gene fragment produced according to 
the method of claims 15-19 under hybridization conditions 
providing detection of genes having 50% or greater 
sequence identity; and 

15 (c) isolating an rgs gene by its association with 

said detectable label. 

45. A method of identifying an rgs gene 
comprising: 

(a) providing a cell; 
20 (b) introducing by transformation into said cell 

sample a candidate rgs gene; 

(c) expressing said candidate rgs gene within said 
cell sample; and 

(d) determining whether said cell sample exhibits 
25 a altered G-protein signalling response, whereby a 

response identifies an rgs gene. 
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46- The method of claim 45, wherein said cell 
comprises smooth muscle a neutrophil, a myeloid cell, an 
insulin secreting /3-cell, a COS-7 cell, comprises a 
xenopus oocyte. 

5 47, The method of claim 45, wherein said 

candidate rgs gene is obtained from a cDNA expression 
library. 

48. The method of claim 45, wherein said G- 
protein signalling response is the membrane trafficking 

10 response, the secretion response, or the [H 3 ]IP3 
response. 

49. An rgs gene isolated according to the method 
comprising: 

(a) providing a cell sample; 
15 (b) introducing by transformation into said cell 

sample a candidate rgs gene; 

(c) expressing said candidate rgs gene within said 
cell sample; and 

(d) determining whether said cell sample exhibits 
20 an altered G-protein signalling response, whereby an 

altered response identifies an rgs gene. 

50. A substantially pure RGS polypeptide. 



51. The polypeptide of claim 50, comprising an 
amino acid sequence substantially identical to an amino 
25 acid sequence shown in SEQ ID NO: 27. 
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52. The polypeptide of claim 50, comprising an 
amino acid sequence substantially identical to an amino 
acid sequence shown in SEQ ID NO: 40. 

53 . A recombinant polypeptide capable of 

5 regulating G-protein mediated signalling, wherein said 
polypeptide comprises a region with substantial identity 
to the polypeptide sequences of SEQ ID HOS: 25 and 26. 

54. A substantially pure polypeptide comprising 
the sequence: 

10 Xaa 1 Xaa 2 Xaa 3 Glu Xaa 4 Xaa 5 Xaa 6 Xaa 7 , wherein 

Xaa : is I, L, E, or V, preferably L; Xaa 2 is A, S, or E, 
preferably A? Xaa 3 is C or V, preferably C; Xaa 4 is D, E, 
N, or K, preferably D; Xaa s is L r Y, or F; Xaa 6 is K or R, 
preferably R; and Xaa 7 is K, R, Y, or F, preferably K 

15 (SEQ ID NO: 25); and 

55. A substantially pure polypeptide comprising 
the sequence: 

Xaa^ Xaa 2 Xaa 3 Xaa 4 Xaa 5 Xaa 6 Xaa 7 Xaa 8 Xaa 9 Xaa 10 
Lys, wherein Xaa 2 is F or L, preferably F; Xaa 2 is D, E, 
20 T, or Q, preferably D? Xaa 3 is E, D r T, Q, A, L, or K; 
Xaa 4 is A or L, preferably A; Xaa 5 is Q or A, preferably 
Q; Xaa 6 = L, D f E, K, T, G, orH; Xaa 7 is H, R f K, Q or D; 
Xaa 8 is I or V, preferably I; Xaa 9 = Q, T, S, N, K, M, G 
or A (SEQ ID NO: 26) . 

2 5 56. A purified antibody which binds specifically 

to an RGS family protein. 
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57. A substantially pure polypeptide having a 
sequence substantially identical to an amino acid 
sequence shown in Figure 3B, SEQ ID NOS: 6-14. 

58. A kit for screening for detecting compounds 
5 which regulate G-protein signalling, said kit comprising 

RGS encoding DNA positioned for expression in a cell. 

59. The kit of claim 58 , wherein said cell is a 
cardiac myocyte, a mast cell, or a neutrophil . 

60. A method for detecting a compound which 

10 regulates G-protein signalling, said method comprising: 

i) providing a cell having RGS encoding DNA 
positioned for expression; 

ii) contacting said cell with the compound to be 

tested ; 

15 iii) monitoring said cell for an alteration in G- 

protein signalling response. 
1 

61. The method of claim 60, wherein said cell is 
a cardiac myocyte, a mast cell, or a neutrophil. 

20 62. The method of claim 60, wherein said response 

is an electrophysical response, a degranulation response, 
or IL-8 response. 

63. Use of an RGS polypeptide for the manufacture 
of a medicament for regulating G— protein signalling in a 
25 cell. 
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64 . Use of a nucleic acid encoding an RGS 
polypeptide for the manufacture of a medicament for 
regulating G-protein signalling in a cell. 
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Sequence of human. r<js2 



SEQ ID NO : ^ L L lle «-«« e ** c J -cz^c-qq :-.=agicac: qct=**cg^ 

SI ccsaactaCA qaaqatcciq rqgaqqcqcz ccacqcqqqa Acaciaaeqq 
=eqcccacqq 

121 c:*c=;c-.-.: cc**ees=*q AC=*cqC==- =ACAC==A*q qacqacqgca 



= r==C=accq 

12L gcz==aaacs W ccicr«i z=qs;qqqaq c:^****cj 
raqactacqc 

2-U cq^ZZaccr: ;qcaaqaqaa -AAqqcacq A =-qq*qcr=^ 

:Qi qqczqaqaqc -zqqcraqqs zqsaqaqaqc asnqcssqq Aaqcqqqagz 
;rac""ic 

jal qcaaqcaqiA qcaciAqcaa ii<;t;^iCJ4 qAACAqaqac **q»::^4*4 

•f T — r - — 

ill zqacaqccxA qaqaq'aqcqz r=zqqqacqz qcaciqgccq qsqcszq^a; 
-"Zaaacj.c 

iSi aacrqaaqzq qaca;-aaga iq==«Xi==iq AACqaqaaac ::::ici*ii 
=icq^iAqz= 

= sq-sca-qqz --aeaAAacq iintigiAc; ; = iciq-=ct wscicacw 
==aeac=aqa. 

5C-. iicziiac=-. ssaacaqaaq ai-aq-^ica iciiciq*;* *i*cictq?= 

551 ACAZAqacAz r— C4**ic: =q=rq*cac= c:iczi*q-- 

7:: — .i==c.iq^ :acqae=rqz; "=---"" ±ccr=qac==z ::ziis==at 
qqezq^cq* 

T3i =q*CAC=AC- ^-c=qqq*ae zrqaq^aac ciAAqiics; Aqc?*q=aqa 
qqq^AAAAC? 

3iL ACqsrqqczz- qq=acqqaC£ aqqcasiqaa agacssaq-SC gggagagaac 

90 1 at^==agaq "aqAAZ.ri. q=-==qq*A.*A i::a»q*C-c zqgc-zgqcag 
-?**<^*«- 

961 c^AAAAACaqq CCCa"A**q Aaqraescrc aacAC="Ac CAA*cacqqc 
A*qagr^=r 

1021 qqcc===qqa qc==s=aq=.q ciaC-AAC" ggacczcaag Age-*cg*ea 

AAACSiCACA 

' I'fl'S'i; "q'Kcq-sqa-Aq qaAC=-qqa= raSiciCit: i;wqi:;=- :ac?aq=4ca 

-ZZAC4AACT 

LKl qacgaaaaqT ;a".:i;ic; ciC-"."i: aaga===aq- qc=za=caqq 
agoz^s^aca 

U01 qq*s u agaa a AAgq^gaaas ctszsacqt: caaqaggzrza icaagccrzrq 
=zcaqcr==a 

L2SI maAACqqa^ =a==z=qzaq saegaasgca qaczgtqag-is ae^geaeaca 

Uli =AACq^-q-q *c=sqq*qcA qi<;qicic;i qAacaaqacg isgeacqagc 
AAaqqacs^A 

USX AAC^-z-az- t.i=q=qrq=A ca:^:=i:z: ==aAcggac= zz--=q~zz.z 
4ACqsci==a 

i-t-il ====aaAC=r ;: 7 ;;;;=:; - = *.:=A«A:;r iqga==-=grg 



'.301. - zzaacaac- -iraaqcqAaq -.i**4c:;:: - zzi _i _ = 



lioi :::siAAqs; -raq^zaqac aC Acac" " ~ a ::?j*4a::: iqccaqr=aa 
AAAC^qqaaq 

L 3 2 1 aaczqiaaaa ^aaaaaAq^a :i:i:cn;a ag-ACJCacq :;qc;:cACi 

1 - - AC ~ AA AC 

liBl AacAAAcrrc ;cacaqA*aq :::;ic::;i acAg±s.*qA* 

»CAA*C"CAC 

--Ttr rq- - - -ciqcccqia :a;-.;;;r;; -.iac;::::: c;ca:::4tt 

1501 4 CCU«q« qA**CAC = C-. C-AC=«*A* C,CACCA*Aq ccqacccc^ 

fcaa«£ 

1361 ^qcqeacqt ^CAtaCCqti qc:«*«: 
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Sequence of hnrnau rcrs2 



SEQ ID NO: 41 L *C*cccagcq cescsaezqq cisaijacac: qcccaaeq'qTi 

z,q+z.*.+.*.q*.A 

51 cesaaetACa qaaqacceaq cqqAqqcqc; ceatezqtjqa *c«;a*cqq 
qcqeeeacq'q 

121 c;4Ct:ci;: eciaecreaq acc*cqe=== cacaercaaq qacqacqq'Ca 
qqizzcacct; 

I3L q-zz-=a*AC= ====*c=z=^ qqccACeaa* izqcrqqqaq c=qq a**«\c«i 
caqatzicqc 

2*1 c^rziACCir iqciaqaqad CAAcqciAaa caagqcacqa qzqqaqccz:; 
riqac- j.zqa 

iQL qqc^qaqaqc crqqcsaqqs tq=aqiqaqs *==cq==cqq aaqcqqqaq= 

j 3 1 q-saaqeaqaa q=ACiiq=ii Aa.qrqqac.AA qaaqaqaqae aaqac.qaaa 
qqaaq-arsc:: 

-21 :qKiqs:k4 qiqaqaqcqq tcrqqqacq- qciCiqq= = - qqq === qqaz 

*ai i.AC-qiAq-q qaeaiZ-iaqa *q^SAi=s*q ccssaeaaaa 
racqqaaq^q 

zqrriirqqz t;icii*i;; i:i::iq*i<; r=acaqr=qr icc=icic*c 

sqaeacsaqa 

aul d*c;i4dc;- :;i4U^A«q i:-if::ici tcAdciqaji AAdCAisqqc 
aaacaeaqT-r 

55 L aqa-aqacac ryq-rzaaaaA rq"==aaaacx rqc^qacaqz siika4<;c; 
acacqqaaca. 

721. q-=iC--aqai ticqicqcqr ic— qaczzr :=_i*c:ri: 

732. rqa.caccA.c-r irsqqqqaae rrqaqqcaac siaaciacs; aqecaqcaqa 
qqq^Aaaacq 

3il asqqqq-q^zz qqcacqqacq xqqcAczqiA aqaecsaq^r qqq4qa.qa.1c 

902, a.-z~=-A.qaq rsaqaa—sa q=z=qq aaa* crzaaqac— rqq;c-qqrcaq 

95 X q-aaaaaqaqq ccra"aa*q aaqraccqz;: aaqaqccsaq qaaacacqqc 
aa<;*q— 

I02 1 qq^T "^qqa qeccccaqsq craezaaert qqae==~*aq iq-CAt^ica 
aaacsacaea 

10 31 qaacqr^ia.q' qAaccr—tc qacacacacr zqaaqacqcr zaqqaqcaca 

:-"iCAAACt 

11-4 1 qi:=AAAAq^ q*;i=a:ic; sac^tzttA; qsazAisaqq 
iqcczs-jici 

1201 qqCAAA^AAA **qqqqa*ac crscqacq-rs caaq-aqqzza acaaqeczzq 

12S 1 =-caAacqqac caccrqqsaq =a.=q*aiqea, qaezqqaq"= = ac-^cacaca 

1221 = A*zq_-q-q icsrqqaqca qaqqacasra qaacaaqacq -cqcacqaqc 
aaaq^qacc-qa 

-2SL aa;^^a:^ -z-q-qzq-A ia=-;:c*cc- =raa.cqqacc ;:;c:^:t; 
AACqc=-ssa 

i-s*l c=qqa*ac=q ::q=ctqr=r :r--;r=rzt r==acza=qr tqqaccrqzq 



L5QL -:;;iACAAq; traaqeqaaq 1 aaaacc- - ; zzzzzzzzzz 

1561 ==scaaaq=- traq-- icac iCACiq— r a c-.;aaaas:: aqr=aq- = AA 
Aaaczqqaaq 

a - - iAc:;-iAJd qaaaaaaqza :jl:a::aa:a aqzaraeacq tqqcc ;;aca 
>.::ac:444C 

'-531 aacaiac:;: ;caci<jaa4<j it==acizsa —aacqrq— *caqzqaqa* 
iCAAAc:=i: 

zzzzzzz-zzz q;-qq-cqia ziczzzzzztz -.sac^zzzzz rqcacqzar:: 

:::acaccac 

IflOL ACS:AAAq*A q^«c4cc-.'. ii*ctcc*«* cqc*ec**.*q ;:q*cccc-.: 
"Iir-.-.q^wtqe .tACitcqf. q-.-.^accti *cqcaq**« 
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